Design, build, and optimize large-scale, production-grade data pipelines and analytics platforms on Azure, leveraging Databricks, Synapse, and the broader Microsoft data ecosystem.

Deliver business-critical data assets for analytics, BI, and AI/ML initiatives.

Key Technical Responsibilities :

- Architect modern data lakes using Azure Data Lake Storage Gen2 for batch and streaming workloads

- Build and maintain scalable ETL/ELT pipelines using Azure Data Factory and Databricks (PySpark, Scala, SQL)

- Orchestrate data workflows across ADF, Databricks, and Synapse Pipelines; implement modular and reusable data pipeline components

- Develop advanced notebooks and production jobs in Azure Databricks (PySpark, SparkSQL, Delta Lake)

- Optimize Spark jobs by tuning partitioning, caching, cluster configuration, and autoscaling for performance and cost

- Implement Delta Lake for ACID-compliant data lakes and enable time travel/audit features

- Engineer real-time data ingestion from Event Hubs, IoT Hub, and Kafka into Databricks and Synapse

- Transform and enrich raw data, building robust data models and marts for analytics and AI use cases

- Integrate structured, semi-structured, and unstructured sources, including APIs, logs, and files

- Implement data validation, schema enforcement, and quality checks using Databricks, PySpark, and tools like Great Expectations

- Manage access controls : Azure AD, Databricks workspace permissions, RBAC, Key Vault integration

- Enable end-to-end lineage and cataloging via Microsoft Purview (or Unity Catalog if multi-cloud)

- Automate deployment of Databricks assets (notebooks, jobs, clusters) using Databricks CLI/REST API, ARM/Bicep, or Terraform

- Build and manage CI/CD pipelines in Azure DevOps for data pipelines and infrastructure as code

- Containerize and deploy custom code with Azure Kubernetes Service (AKS) or Databricks Jobs as needed

- Instrument monitoring and alerting with Azure Monitor, Log Analytics, and Databricks native tools

- Diagnose and resolve performance bottlenecks in distributed Spark jobs and pipeline orchestrations

- Collaborate with data scientists, BI engineers, and business stakeholders to deliver data solutions

- Document design decisions, create technical specifications, and enforce engineering standards across the team

Required Skills & Experience :

- Hands-on with :

1. Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Databricks

2. PySpark, SparkSQL, advanced SQL, Delta Lake

3. Data modeling (star/snowflake), partitioning, and data warehouse concepts

- Strong Python programming and experience with workflow/orchestration (ADF, Airflow, or Synapse Pipelines)

- Infrastructure automation : ARM/Bicep, Terraform, Databricks CLI/API, Azure DevOps

- Deep understanding of Spark internals, cluster optimization, cost management, and distributed computing

- Data security, RBAC, encryption, and compliance (SOC2, ISO, GDPR/DPDPA)

- Excellent troubleshooting, performance tuning, and documentation skills