As the Data Engineering - Lead, you will be responsible for architecting, building, and scaling the core AI data infrastructure that fuels intelligent decisioning, personalization, and LLM-based use cases.

The core responsibilities for the job include the following :

Data Platform Architecture and Engineering :

- Lead the design and implementation of end-to-end data pipelines on the Databricks Lakehouse platform.

- Build metadata-driven ingestion and transformation frameworks for multimodal data.

- Enable real-time and batch data pipelines for downstream AI and analytics applications.

- Define and implement scalable data models, schemas, and lineage tracking mechanisms.

AI and Knowledge Graph Enablement:

- Architect knowledge graph pipelines for unified entity resolution and semantic enrichment.

- Partner with AI/ML engineers to design feature stores, embedding workflows, and reasoning layers.

- Integrate vector databases and graph systems (Neo4j, Neptune, etc.) to support intelligent retrieval and recommendations.

- Optimize data preparation and transformation workflows for LLM fine-tuning and inference.

Data Governance, Observability, and Reliability:

- Drive data quality, lineage, and cataloging standards.

- Implement automated validation and observability frameworks in data pipelines.

- Champion CI/CD automation and data versioning best practices using Terraform and GitHub Actions.

- Collaborate with cross-functional teams to enforce compliance, privacy, and data access controls.

Requirements :

- 9+ years of experience in data engineering, AI platform development, or enterprise data architecture.

- Expertise in Databricks, Delta Lake, PySpark, and distributed data processing.

- Advanced proficiency in Python, SQL, and Spark-based transformations.

- Experience with real-time streaming pipelines (Kafka, Debezium, etc.

- Hands-on with knowledge graphs, semantic data modeling, or graph-based analytics.

- Deep understanding of data governance, metadata management, and security frameworks.

- Ability to lead technical discussions across AI, engineering, and product teams.

Preferred Qualifications :

- Experience in AI-driven product organizations or Customer Data Platforms (CDP).

- Exposure to LLM data workflows, embeddings, or retrieval-augmented generation (RAG).

- Familiarity with AWS, GCP, or Azure cloud ecosystems.

- Knowledge of data observability platforms and impact analysis frameworks

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Kalpna

Talent Acquisition at Talent Socio

Last Active: 24 Apr 2026

Job Views:
89

Applications: 22

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1584513

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers