Posted on: 27/11/2025



Description :
We are looking for a highly skilled Senior Data Engineer with strong experience in building scalable, resilient, and high-performance data pipelines across batch and streaming architectures. The ideal candidate will have deep expertise in cloud platforms (preferably Azure), big-data frameworks, CDC pipelines, and modern Lakehouse technologies such as Delta Lake and Databricks. You will collaborate with cross-functional teams, design end-to-end ingestion frameworks, optimize performance, and ensure data quality across all layers of the platform.
Key Responsibilities :
Data Ingestion & Processing :
- Design and implement streaming data ingestion pipelines using Apache Kafka, Confluent Cloud, and Delta Live Tables.
- Build and maintain batch ingestion workflows leveraging Databricks, PySpark, and cloud-native orchestration tools.
- Utilize Confluent Kafka Connectors for CDC pipelines, including configuration, monitoring, troubleshooting, and optimization.
- Implement Change Data Capture (CDC) solutions using Debezium, SQL Server CDC, Oracle GoldenGate, or equivalent tools.
- Develop efficient PySpark jobs for large-scale data transformations, enrichment, validations, and upsert workloads.
- Ensure performance tuning, including optimizing Spark jobs, partitioning strategies, shuffle minimization, and resource utilization.
Data Quality, Monitoring & Operations :
- Implement robust data validation, deduplication, and reconciliation processes, ensuring high-quality and reliable data loads.
- Analyze and document data load performance metrics, including time taken, throughput, and scalability benchmarks.
- Monitor pipelines for partial loads, duplicate detection, schema evolution, and handle operational exceptions.
- Build alerting, logging, and operational dashboards using tools such as Datadog, Azure Monitor, or CloudWatch.
Architecture & Modelling :
- Design scalable Lakehouse architectures using Delta Lake, Unity Catalog, and medallion patterns (Bronze/Silver/Gold).
- Develop data models, partitioning strategies, and table optimization techniques (Z-Ordering, VACUUM, OPTIMIZE).
- Implement best practices for metadata management, governance, and lineage.
Cloud Platform Expertise :
- Hands-on experience with Azure (ADLS Gen2, ADF, Event Hub, Azure Functions, Synapse, Key Vault, Azure DevOps).
- Exposure to AWS services such as S3, Glue, Lambda, Kinesis, and IAM is a plus.
- Build and optimize compute clusters using Databricks (cluster configs, job clusters, autoscaling, DBX CLI).
Programming & Tools :
- Strong programming experience with Python, Scala, or Java, focusing on data transformation and distributed processing.
- Advanced SQL skills for analytics, transformations, and performance optimization.
- Familiarity with Git, CI/CD pipelines, code reviews, and automated deployments.
- Experience with infrastructure-as-code tools like Terraform, ARM templates, or CloudFormation is advantageous.
Collaboration & Leadership :
- Work with product owners, data modelers, and business stakeholders to interpret requirements and translate them into scalable technical solutions.
- Provide mentorship to junior engineers and establish engineering best practices across the team.
- Communicate complex technical concepts clearly to both technical and non-technical stakeholders.
Required Qualifications :
- 5+ years of experience designing and building data pipelines using Apache Spark, Databricks, or equivalent big-data frameworks.
- Strong hands-on knowledge of Kafka, Confluent Cloud, Event Hub, and messaging/streaming ecosystems.
- Expertise in CDC pipelines, relational databases (SQL Server, Oracle, PostgreSQL), and event-driven processing.
- Experience with Azure or AWS cloud services for data ingestion, compute, and orchestration.
- Solid understanding of data warehousing, Lakehouse architectures, Delta Lake, and data modelling.
- Proficiency with DevOps, Git-based workflows, and CICD frameworks.
- Strong analytical, problem-solving, and communication skills.
Preferred Qualifications :
- Experience with Delta Live Tables (DLT), Databricks Workflows, and Unity Catalog.
- Knowledge of RabbitMQ, Azure Event Hub, or other messaging platforms.
- Exposure to ML pipelines or feature engineering frameworks (nice to have).
- Certifications in Azure, Databricks, or Confluent.
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1581896
Interview Questions for you
View All