Posted on: 26/12/2025
Description :
Employment Type : Full-Time
Reporting To : Platform Architect
Role Overview :
The Senior Data Engineer will build and maintain the core data infrastructure for an enterprise AI platform. This role focuses on designing scalable data pipelines, developing knowledge graphs, and preparing structured and unstructured data for AI and LLM-based applications.
Roles & Responsibilities :
Data Pipeline Development :
- Design and build scalable data ingestion pipelines from enterprise systems (ERP, documentation tools, version control, and project management tools)
- Develop connectors for structured, semi-structured, and unstructured data
- Implement incremental data loads, change data capture (CDC), and real-time sync
- Ensure data quality through validation, deduplication, and lineage tracking
Knowledge Graph Engineering :
- Design ontologies and graph schemas for complex enterprise relationships
- Implement entity resolution and relationship inference across data sources
- Build APIs and query interfaces for graph traversal
- Optimize graph storage and query performance for large-scale usage
Enterprise Data Integration :
- Extract and model enterprise metadata such as business rules and data dictionaries
- Parse and semantically index documents and code artifacts
- Build integrations with enterprise APIs and internal platforms
AI & LLM Data Infrastructure :
- Prepare structured and contextual data for LLM consumption
- Design embedding strategies and manage vector databases for semantic search
- Build memory and context management systems for stateful AI applications
Required Skills :
Core Requirements :
- 5+ years of Data Engineering experience with production-grade pipelines
- Strong Python skills (clean, testable, maintainable code)
- MongoDB expertise (schema design, aggregation pipelines, indexing, performance tuning)
- Vector databases experience (Qdrant, Pinecone, Weaviate, pgvector)
- Document processing experience (chunking, metadata extraction, PDFs/Word/HTML; LangChain or similar)
- Strong SQL Skills (complex Queries, Joins, Window Functions, Optimization)
- ETL/ELT at scale (incremental loads, CDC, idempotent pipelines)
- Pipeline orchestration tools (Airflow, Dagster, Prefect, or similar)
Good to Have / Strong Plus :
- Experience building production RAG pipelines
- Deep understanding of embedding models and dimensionality
- Graph databases (Neo4j) and Cypher query expertise
- LLM application development using LangChain or Lang Graph
- Streaming systems (Kafka, Flink) for real-time pipelines
- Hybrid search (vector + keyword/metadata filtering)
- Apache Spark for large-scale transformations
What We Offer :
- Work on cutting-edge AI and knowledge graph technologies
- Build foundational infrastructure for an enterprise AI platform
- Competitive compensation with equity options
- Flexible remote/hybrid work setup
- Learning budget and conference support
Skills : pipelines,rag,skills,cdc,vector,databases,metadata,flink,kafka,design,data
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
ML / DL Engineering
Job Code
1594942
Interview Questions for you
View All