HamburgerMenu
hirist

Data Engineer - ETL/PySpark

Talent Socio
Mumbai
4 - 8 Years
star-icon
3.9white-divider12+ Reviews

Posted on: 24/09/2025

Job Description

We are building a next-generation Customer Data Platform (CDP) powered by the Databricks Lakehouse architecture and Lakehouse Engine framework. We're looking for a skilled Data Engineer with 4-9 years of experience to help us build metadata-driven pipelines, enable real-time data processing, and support marketing campaign orchestration capabilities at scale.


Responsibilities :


- Configure and extend the Lakehouse Engine framework for batch and streaming pipelines.

- Implement the medallion architecture (Bronze -> Silver -> Gold) using Delta Lake.

- Develop metadata-driven ingestion patterns from various customer data sources.

- Build reusable transformers for PII handling, data standardization, and data quality enforcement.

- Build Spark Structured Streaming pipelines for customer behavior and event tracking.

- Set up Debezium + Kafka for Change Data Capture (CDC) from CRM systems.

- Design and develop identity resolution logic across both streaming and batch datasets.

- Use Unity Catalog for managing RBAC, data lineage, and auditability.

- Integrate Great Expectations or similar tools for continuous data quality monitoring.

- Set up CI/CD pipelines for deploying Databricks notebooks, jobs, and DLT pipelines.


Requirements :


- 4-9 years of hands-on experience in data engineering.

- Expertise in Databricks Lakehouse platform, Delta Lake, and Unity Catalog.

- Advanced PySpark skills, including Structured Streaming.

- Experience implementing Kafka + Debezium CDC pipelines.

- Strong in SQL transformations, data modeling, and analytical querying.

- Familiarity with metadata-driven architecture and parameterized pipelines.

- Understanding of data governance: PII masking, access control, and lineage tracking.

- Proficiency in working with AWS, MongoDB, and PostgreSQL.

- Experience working on Customer 360 or Martech CDP platforms.

- Familiarity with Martech tools like Segment, Braze, or other CDPs.

- Exposure to ML pipelines for segmentation, scoring, or personalization.

- Knowledge of CI/CD for data workflows using GitHub Actions, Terraform, or Databricks CLI.


info-icon

Did you find something suspicious?