Posted on: 07/11/2025
Role Overview :
Design, build, and optimize large-scale, production-grade data pipelines and analytics platforms on Azure, leveraging Databricks, Synapse, and the broader Microsoft data ecosystem.
Deliver business-critical data assets for analytics, BI, and AI/ML initiatives.
Key Technical Responsibilities :
- Architect modern data lakes using Azure Data Lake Storage Gen2 for batch and streaming workloads
- Build and maintain scalable ETL/ELT pipelines using Azure Data Factory and Databricks (PySpark, Scala, SQL)
- Orchestrate data workflows across ADF, Databricks, and Synapse Pipelines; implement modular and reusable data pipeline components
- Develop advanced notebooks and production jobs in Azure Databricks (PySpark, SparkSQL, Delta Lake)
- Optimize Spark jobs by tuning partitioning, caching, cluster configuration, and autoscaling for performance and cost
- Implement Delta Lake for ACID-compliant data lakes and enable time travel/audit features
- Engineer real-time data ingestion from Event Hubs, IoT Hub, and Kafka into Databricks and Synapse
- Transform and enrich raw data, building robust data models and marts for analytics and AI use cases
- Integrate structured, semi-structured, and unstructured sources, including APIs, logs, and files
- Implement data validation, schema enforcement, and quality checks using Databricks, PySpark, and tools like Great Expectations
- Manage access controls : Azure AD, Databricks workspace permissions, RBAC, Key Vault integration
- Enable end-to-end lineage and cataloging via Microsoft Purview (or Unity Catalog if multi-cloud)
- Automate deployment of Databricks assets (notebooks, jobs, clusters) using Databricks CLI/REST API, ARM/Bicep, or Terraform
- Build and manage CI/CD pipelines in Azure DevOps for data pipelines and infrastructure as code
- Containerize and deploy custom code with Azure Kubernetes Service (AKS) or Databricks Jobs as needed
- Instrument monitoring and alerting with Azure Monitor, Log Analytics, and Databricks native tools
- Diagnose and resolve performance bottlenecks in distributed Spark jobs and pipeline orchestrations
- Collaborate with data scientists, BI engineers, and business stakeholders to deliver data solutions
- Document design decisions, create technical specifications, and enforce engineering standards across the team
Required Skills & Experience :
- Hands-on with :
1. Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Databricks
2. PySpark, SparkSQL, advanced SQL, Delta Lake
3. Data modeling (star/snowflake), partitioning, and data warehouse concepts
- Strong Python programming and experience with workflow/orchestration (ADF, Airflow, or Synapse Pipelines)
- Infrastructure automation : ARM/Bicep, Terraform, Databricks CLI/API, Azure DevOps
- Deep understanding of Spark internals, cluster optimization, cost management, and distributed computing
- Data security, RBAC, encryption, and compliance (SOC2, ISO, GDPR/DPDPA)
- Excellent troubleshooting, performance tuning, and documentation skills
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1570820