Posted on: 04/12/2025
Description :
- Lead the design and implementation of end-to-end data pipelines on the Databricks Lakehouse platform.
- Build metadata-driven ingestion and transformation frameworks for multimodal data.
- Enable real-time and batch data pipelines for downstream AI and analytics applications.
- Define and implement scalable data models, schemas, and lineage tracking mechanisms.
AI and Knowledge Graph Enablement:
- Architect knowledge graph pipelines for unified entity resolution and semantic enrichment.
- Partner with AI/ML engineers to design feature stores, embedding workflows, and reasoning layers.
- Integrate vector databases and graph systems (Neo4j, Neptune, etc.) to support intelligent retrieval and recommendations.
- Optimize data preparation and transformation workflows for LLM fine-tuning and inference.
Data Governance, Observability, and Reliability:
- Drive data quality, lineage, and cataloging standards.
- Implement automated validation and observability frameworks in data pipelines.
- Champion CI/CD automation and data versioning best practices using Terraform and GitHub Actions.
- Collaborate with cross-functional teams to enforce compliance, privacy, and data access controls.
Requirements :
- 9+ years of experience in data engineering, AI platform development, or enterprise data architecture.
- Expertise in Databricks, Delta Lake, PySpark, and distributed data processing.
- Advanced proficiency in Python, SQL, and Spark-based transformations.
- Experience with real-time streaming pipelines (Kafka, Debezium, etc.
- Hands-on with knowledge graphs, semantic data modeling, or graph-based analytics.
- Deep understanding of data governance, metadata management, and security frameworks.
- Ability to lead technical discussions across AI, engineering, and product teams.
Preferred Qualifications :
- Exposure to LLM data workflows, embeddings, or retrieval-augmented generation (RAG).
- Familiarity with AWS, GCP, or Azure cloud ecosystems.
- Knowledge of data observability platforms and impact analysis frameworks
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1584513