Job Description :

We are looking for a highly skilled Senior Data Engineer to design, build, and maintain scalable, high-performance data platforms across on-premise and cloud environments (preferably Microsoft Azure).

The ideal candidate will have deep expertise in modern data engineering ecosystems, distributed processing frameworks, and metadata-driven architectures, along with a strong focus on data quality, governance, and performance optimization.

Key Responsibilities :

- Data Engineering & Pipeline Development Design, develop, and maintain scalable, reliable, and high-performance data pipelines using Python, Apache Spark, and Shell scripting.

- Build reusable, metadata-driven frameworks for data ingestion, transformation, and processing.

- Develop and manage ETL/ELT pipelines ensuring data quality, consistency, and reliability.

- Optimize pipelines for performance, scalability, and cost efficiency across environments.

- Data Platform & Architecture Architect and implement modern data lake and data warehouse solutions.

- Work with advanced data technologies such as Apache Iceberg and Trino for large-scale analytics.

- Implement robust data modeling techniques (dimensional & normalized models).

- Streaming & Workflow Orchestration Build and manage real-time streaming pipelines using Apache Kafka.

- Orchestrate workflows and dependencies using Apache Airflow.

- Ensure fault tolerance and recovery mechanisms in pipelines.

- Cloud & Containerization Deploy and manage applications using Docker and Kubernetes.

- Work with Azure Data Lake Storage (ADLS) and S3-compatible systems like MinIO.

- Support hybrid data environments spanning on-premise and cloud platforms.

- Data Governance & Quality Integrate and automate data quality checks, validation, and monitoring frameworks.

- Work with DataHub and Apache Ranger for data governance, lineage, and access control.

- Ensure adherence to data security, compliance, and governance standards.

- Monitoring & Observability Monitor system health and performance using Prometheus and Grafana.

- Proactively identify bottlenecks and optimize system performance.

- Collaboration & Delivery Collaborate with data architects, product owners, and business stakeholders to translate requirements into technical solutions.

- Contribute to migration and modernization projects.

- Ensure timely delivery with a strong focus on quality and ownership.

Technical Skills Required

- Core Technologies Strong programming skills in Python and Java Expertise in Apache Spark (Batch & Distributed Processing) Strong command of Linux and Shell scripting

- Data Engineering Stack ETL/ELT frameworks and pipeline design Data modeling and warehousing concepts Experience with Apache Iceberg, Trino

- Streaming & Orchestration Apache Kafka Apache Airflow

- Cloud & Storage Microsoft Azure (preferred) Azure Data Lake Storage (ADLS) MinIO / S3-compatible storage

- Containerization & DevOps Docker Kubernetes

- Data Governance & Monitoring DataHub, Apache Ranger Prometheus, Grafana

Preferred Qualifications :

- Experience with metadata-driven architecture frameworks Strong understanding of data lifecycle management Experience working in hybrid (on-prem + cloud) environments Familiarity with CI/CD pipelines for data workflows

Key Responsibilities :

- Data Engineering & Pipeline Development Design, develop, and maintain scalable, reliable, and high-performance data pipelines using Python, Apache Spark, and Shell scripting.

- Build reusable, metadata-driven frameworks for data ingestion, transformation, and processing.

- Develop and manage ETL/ELT pipelines ensuring data quality, consistency, and reliability.

- Optimize pipelines for performance, scalability, and cost efficiency across environments.

- Data Platform & Architecture Architect and implement modern data lake and data warehouse solutions.

- Work with advanced data technologies such as Apache Iceberg and Trino for large-scale analytics.

- Implement robust data modeling techniques (dimensional & normalized models).

- Streaming & Workflow Orchestration Build and manage real-time streaming pipelines using Apache Kafka.

- Orchestrate workflows and dependencies using Apache Airflow.

- Ensure fault tolerance and recovery mechanisms in pipelines.

- Cloud & Containerization Deploy and manage applications using Docker and Kubernetes.

- Work with Azure Data Lake Storage (ADLS) and S3-compatible systems like MinIO.

- Support hybrid data environments spanning on-premise and cloud platforms.

- Data Governance & Quality Integrate and automate data quality checks, validation, and monitoring frameworks.

- Work with DataHub and Apache Ranger for data governance, lineage, and access control.

- Ensure adherence to data security, compliance, and governance standards.

- Monitoring & Observability Monitor system health and performance using Prometheus and Grafana.

- Proactively identify bottlenecks and optimize system performance.

- Collaboration & Delivery Collaborate with data architects, product owners, and business stakeholders to translate requirements into technical solutions.

- Contribute to migration and modernization projects.

- Ensure timely delivery with a strong focus on quality and ownership.