Senior AI Data Engineer (RAG / Retrieval / Python)
Remote (Ukraine, Europe)
Apply for this role ↓Build and maintain production AI data pipelines and retrieval-driven workflows for a funded AI/data platform. You'll own the data engineering layer that powers RAG and LLM-integrated features at scale.
We’re hiring a Senior AI Data Engineer to join a client-facing team building an AI/data platform with heavy external data ingestion, non-standard data engineering, and retrieval-driven workflows.
This is not classic BI/reporting or data warehouse work, and it’s not a pure backend API role. The system ingests large volumes of non-standard external data, processes it through AI/LLM workflows, and makes it retrievable at scale. If you’ve built production RAG pipelines or vector-based retrieval systems and you’re comfortable operating in an ambiguous startup environment, this role is for you.
What you’ll do
- Build and maintain data ingestion pipelines for structured and unstructured external data
- Design and support retrieval pipelines for AI/LLM workflows — chunking, embedding, indexing, metadata enrichment
- Develop Python backend services (FastAPI) that expose data to the broader platform
- Own data flows across Postgres, object storage, vector search, and related stores
- Improve platform reliability and retrieval performance as data volume grows
- Collaborate closely with software engineers and AI teammates to evolve the system
The environment
The team is small and operates with minimal process overhead. The codebase is production-facing, and engineers own outcomes end-to-end — not just their individual tickets.
What we're looking for
- 5+ years of commercial software/data engineering experience
- Strong commercial experience with Python
- Hands-on experience building data pipelines and ingestion workflows
- Hands-on experience with AI/LLM retrieval systems — RAG pipelines, vector search, embedding-based retrieval, or document ingestion / chunking / indexing workflows
- Experience building or maintaining FastAPI or similar Python backend services
- Experience with AWS data/cloud infrastructure
- Experience with unstructured or semi-structured data
- Strong SQL and practical data modeling skills
- Ability to work independently in ambiguous product environments
- Strong written and spoken English — all technical documentation and client reviews are in English
Nice to have
- Production experience with vector databases such as Pinecone, pgvector, Qdrant, Weaviate, OpenSearch vector search, or FAISS
- Experience with graph databases or connected-data modeling — Neo4j, Amazon Neptune
- Experience with scraping-heavy or connector-heavy ingestion systems
- Experience with LangChain, LangGraph, Haystack, LlamaIndex, or similar orchestration frameworks
- Experience with Terraform
- Experience supporting retrieval quality, latency, and production reliability
- Experience with reranking, hybrid retrieval, or evaluation of retrieval quality
- Experience with AI agent workflows or tool-calling systems
- Experience with data governance, permissions, or enterprise knowledge access
What you'll get
- Direct client access — work straight with technical decision-makers, no PM or agency layer in between
- Long-term engagements with well-funded SaaS and regulated-enterprise clients
- Senior-only environment — every peer has 5+ years and a track record we've vetted
- Real production AI work on modern stacks — not legacy maintenance or POC theater
- Fully remote, flexible hours, B2B contractor model with transparent rates
- Growth path into tech-lead and architect roles on multi-year client engagements
Application sent. We'll be in touch within a few business days.
Don't see the right role?
We occasionally hire for roles we haven't posted yet. Send your background to career@insoftex.com and we'll keep it on file.