The Senior Data Engineer will build and maintain the core data infrastructure for an enterprise AI platform. This role focuses on designing scalable data pipelines, developing knowledge graphs, and preparing structured and unstructured data for AI and LLM-based applications.

Roles & Responsibilities :

Data Pipeline Development :

- Design and build scalable data ingestion pipelines from enterprise systems (ERP, documentation tools, version control, and project management tools)

- Develop connectors for structured, semi-structured, and unstructured data

- Implement incremental data loads, change data capture (CDC), and real-time sync

- Ensure data quality through validation, deduplication, and lineage tracking

Knowledge Graph Engineering :

- Design ontologies and graph schemas for complex enterprise relationships

- Implement entity resolution and relationship inference across data sources

- Build APIs and query interfaces for graph traversal

- Optimize graph storage and query performance for large-scale usage

Enterprise Data Integration :

- Extract and model enterprise metadata such as business rules and data dictionaries

- Parse and semantically index documents and code artifacts

- Build integrations with enterprise APIs and internal platforms

AI & LLM Data Infrastructure :

- Prepare structured and contextual data for LLM consumption

- Design embedding strategies and manage vector databases for semantic search

- Build memory and context management systems for stateful AI applications

Required Skills :

Core Requirements :

- 5+ years of Data Engineering experience with production-grade pipelines

- Strong Python skills (clean, testable, maintainable code)

- MongoDB expertise (schema design, aggregation pipelines, indexing, performance tuning)

- Vector databases experience (Qdrant, Pinecone, Weaviate, pgvector)

- Document processing experience (chunking, metadata extraction, PDFs/Word/HTML; LangChain or similar)

- Strong SQL Skills (complex Queries, Joins, Window Functions, Optimization)

- ETL/ELT at scale (incremental loads, CDC, idempotent pipelines)

- Pipeline orchestration tools (Airflow, Dagster, Prefect, or similar)

Good to Have / Strong Plus :

- Experience building production RAG pipelines

- Deep understanding of embedding models and dimensionality

- Graph databases (Neo4j) and Cypher query expertise

- LLM application development using LangChain or Lang Graph

- Streaming systems (Kafka, Flink) for real-time pipelines

- Hybrid search (vector + keyword/metadata filtering)

- Apache Spark for large-scale transformations

What We Offer :

- Work on cutting-edge AI and knowledge graph technologies

- Build foundational infrastructure for an enterprise AI platform

- Competitive compensation with equity options

- Flexible remote/hybrid work setup

- Learning budget and conference support

Skills : pipelines,rag,skills,cdc,vector,databases,metadata,flink,kafka,design,data

Did you find something suspicious?

Posted by

Anne Anusha

Sr. Human Resource Executive at Aifa Labs

Last Active: 27 Dec 2025

Job Views:
15

Applications: 10

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

ML / DL Engineering

Job Code

1594942

Jobs by location

Interview Questions for you

View All

Top 20+ NumPy Interview Questions and Answers