HamburgerMenu
hirist

Job Description

Description :


We are building a next-generation Customer Data Platform (CDP) powered by the Databricks Lakehouse architecture and Lakehouse Engine framework.


We're looking for a skilled Data Engineer with 4-9 years of experience to help us build metadata-driven pipelines, enable real-time data processing, and support marketing campaign orchestration capabilities at scale.


The core responsibilities for the job include the following :


Lakehouse Engine Implementation :


- Configure and extend the Lakehouse Engine framework for batch and streaming pipelines.


- Implement the medallion architecture (Bronze -> Silver -> Gold) using Delta Lake.


- Develop metadata-driven ingestion patterns from various customer data sources.


- Build reusable transformers for PII handling, data standardization, and data quality enforcement.


Real-Time CDP Enablement :


- Build Spark Structured Streaming pipelines for customer behavior and event tracking.


- Set up Debezium + Kafka for Change Data Capture (CDC) from CRM systems.


- Design and develop identity resolution logic across both streaming and batch datasets.


DataOps and Governance :


- Use Unity Catalog for managing RBAC, data lineage, and auditability.


- Integrate Great Expectations or similar tools for continuous data quality monitoring.


- Set up CI/CD pipelines for deploying Databricks notebooks, jobs, and DLT pipelines.


Requirements :


- 4-9 years of hands-on experience in data engineering.


- Expertise in Databricks Lakehouse platform, Delta Lake, and Unity Catalog.


- Advanced PySpark skills, including Structured Streaming.


- Experience implementing Kafka + Debezium CDC pipelines.


- Strong in SQL transformations, data modeling, and analytical querying.


- Familiarity with metadata-driven architecture and parameterized pipelines.


- Understanding of data governance : PII masking, access controls, and lineage tracking.


- Proficiency in working with AWS, MongoDB, and PostgreSQL.


Nice to Have :


- Experience working on Customer 360 or Martech CDP platforms.


- Familiarity with Martech tools like Segment, Braze, or other CDPs.


- Exposure to ML pipelines for segmentation, scoring, or personalization.


- Knowledge of CI/CD for data workflows using GitHub Actions, Terraform, or Databricks CLI.


info-icon

Did you find something suspicious?