Posted on: 06/11/2025
Role Overview :
Design, build, and optimize large-scale, production-grade data pipelines and analytics platforms on Azure, leveraging Databricks, Synapse, and the broader Microsoft data ecosystem.
Deliver business-critical data assets for analytics, BI, and AI/ML initiatives.
Key Technical Responsibilities :
- Architect modern data lakes using Azure Data Lake Storage Gen2 for batch and streaming workloads
- Build and maintain scalable ETL/ELT pipelines using Azure Data Factory and Databricks (PySpark, Scala, SQL)
- Orchestrate data workflows across ADF, Databricks, and Synapse Pipelines; implement modular and reusable data pipeline components
- Develop advanced notebooks and production jobs in Azure Databricks (PySpark, SparkSQL, Delta Lake)
- Optimize Spark jobs by tuning partitioning, caching, cluster configuration, and autoscaling for performance and cost
- Implement Delta Lake for ACID-compliant data lakes and enable time travel/audit features
- Engineer real-time data ingestion from Event Hubs, IoT Hub, and Kafka into Databricks and Synapse
- Transform and enrich raw data, building robust data models and marts for analytics and AI use cases
- Integrate structured, semi-structured, and unstructured sources, including APIs, logs, and files
- Implement data validation, schema enforcement, and quality checks using Databricks, PySpark, and tools like Great Expectations
- Manage access controls : Azure AD, Databricks workspace permissions, RBAC, Key Vault integration
- Enable end-to-end lineage and cataloging via Microsoft Purview (or Unity Catalog if multi-cloud)
- Automate deployment of Databricks assets (notebooks, jobs, clusters) using Databricks CLI/REST API, ARM/Bicep, or Terraform
- Build and manage CI/CD pipelines in Azure DevOps for data pipelines and infrastructure as code
- Containerize and deploy custom code with Azure Kubernetes Service (AKS) or Databricks Jobs as needed
- Instrument monitoring and alerting with Azure Monitor, Log Analytics, and Databricks native tools
- Diagnose and resolve performance bottlenecks in distributed Spark jobs and pipeline orchestrations
- Collaborate with data scientists, BI engineers, and business stakeholders to deliver data solutions
- Document design decisions, create technical specifications, and enforce engineering standards across the team
Required Skills & Experience :
- Hands-on with :
1. Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Databricks
2. PySpark, SparkSQL, advanced SQL, Delta Lake
3. Data modeling (star/snowflake), partitioning, and data warehouse concepts
- Strong Python programming and experience with workflow/orchestration (ADF, Airflow, or Synapse Pipelines)
- Infrastructure automation : ARM/Bicep, Terraform, Databricks CLI/API, Azure DevOps
- Deep understanding of Spark internals, cluster optimization, cost management, and distributed computing
- Data security, RBAC, encryption, and compliance (SOC2, ISO, GDPR/DPDPA)
- Excellent troubleshooting, performance tuning, and documentation skills
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1570820
Interview Questions for you
View All