HamburgerMenu
hirist

Job Description

Description :



As the Data Engineering - Lead, you will be responsible for architecting, building, and scaling the core AI data infrastructure that fuels intelligent decisioning, personalization, and LLM-based use cases.


The core responsibilities for the job include the following :


Data Platform Architecture and Engineering :


- Lead the design and implementation of end-to-end data pipelines on the Databricks Lakehouse platform.


- Build metadata-driven ingestion and transformation frameworks for multimodal data.


- Enable real-time and batch data pipelines for downstream AI and analytics applications.


- Define and implement scalable data models, schemas, and lineage tracking mechanisms.

AI and Knowledge Graph Enablement:


- Architect knowledge graph pipelines for unified entity resolution and semantic enrichment.


- Partner with AI/ML engineers to design feature stores, embedding workflows, and reasoning layers.


- Integrate vector databases and graph systems (Neo4j, Neptune, etc.) to support intelligent retrieval and recommendations.


- Optimize data preparation and transformation workflows for LLM fine-tuning and inference.

Data Governance, Observability, and Reliability:


- Drive data quality, lineage, and cataloging standards.


- Implement automated validation and observability frameworks in data pipelines.


- Champion CI/CD automation and data versioning best practices using Terraform and GitHub Actions.


- Collaborate with cross-functional teams to enforce compliance, privacy, and data access controls.

Requirements :



- 9+ years of experience in data engineering, AI platform development, or enterprise data architecture.


- Expertise in Databricks, Delta Lake, PySpark, and distributed data processing.


- Advanced proficiency in Python, SQL, and Spark-based transformations.


- Experience with real-time streaming pipelines (Kafka, Debezium, etc.


- Hands-on with knowledge graphs, semantic data modeling, or graph-based analytics.


- Deep understanding of data governance, metadata management, and security frameworks.


- Ability to lead technical discussions across AI, engineering, and product teams.

Preferred Qualifications :



- Experience in AI-driven product organizations or Customer Data Platforms (CDP).


- Exposure to LLM data workflows, embeddings, or retrieval-augmented generation (RAG).


- Familiarity with AWS, GCP, or Azure cloud ecosystems.


- Knowledge of data observability platforms and impact analysis frameworks


info-icon

Did you find something suspicious?