Description :

- Design, develop, and maintain scalable data pipelines using Azure Databricks, Apache Spark, and Delta Lake.

- Implement data ingestion, transformation, and integration from multiple structured and unstructured data sources.

- Optimize ETL/ELT processes for performance, cost-efficiency, and scalability in Azure environments.

- Build and manage data models, ensuring data quality, consistency, and lineage across systems.

- Collaborate with cross-functional teams (data scientists, analysts, business users) to understand requirements and deliver data solutions.

- Integrate Databricks with other Azure services such as Azure Data Lake Storage (ADLS), Synapse Analytics, Azure SQL Database, and Azure Data Factory.

- Develop and manage notebooks, jobs, and clusters within Azure Databricks for batch and streaming workloads.

- Implement CI/CD pipelines for data workflows using tools like Azure DevOps or GitHub Actions.

- Monitor, debug, and tune performance of Spark jobs and clusters.

- Ensure compliance, data security, and governance following organizational and regulatory standards.

- Document technical designs, processes, and best practices.

Required Skills and Qualifications :

- Bachelors or Masters degree in Computer Science, Information Systems, or a related field.

- 4+ years of experience in data engineering or big data development, with at least 2+ years in Azure Databricks.

- Strong hands-on experience in Apache Spark (PySpark/Scala/SQL) for data transformation and processing.

- Proficiency in Azure Data Lake (ADLS), Azure Data Factory (ADF), Azure Synapse, and Azure SQL Database.

- Experience building Delta Lake architectures and implementing medallion (bronze-silver-gold) data models.

- Solid understanding of ETL/ELT design, orchestration, and performance optimization.

- Experience with CI/CD pipelines, Git, and DevOps principles.

- Familiarity with data security, compliance, and access control within Azure.

- Strong problem-solving skills and the ability to troubleshoot distributed data workflows.

- Excellent communication skills and the ability to work in agile, cross-functional teams.

Preferred Skills :

- Experience with Power BI or other visualization tools for data consumption.

- Exposure to machine learning pipelines or integration with Azure Machine Learning.

- Knowledge of Databricks REST APIs, Unity Catalog, and MLflow.

- Experience with real-time data streaming tools (Kafka, Event Hubs, or Spark Streaming).

- Familiarity with infrastructure-as-code (IaC) using Terraform or ARM templates.

- Understanding of data governance, metadata management, and lineage tracking.

Key Attributes :

- Strong analytical and data-driven mindset.

- Excellent problem-solving and performance-tuning abilities.

- Detail-oriented with a focus on data integrity and reliability.

- Ability to manage multiple data projects simultaneously.

- Passion for continuous learning and adopting modern data technologies