Posted on: 17/09/2025
Position Summary :
The ideal candidate will have extensive experience with Apache Spark, Delta Lake, PySpark/Scala, and cloud platforms (Azure, AWS, or GCP), along with a proven ability to define best practices for architecture, governance, security, and performance optimization on Databricks.
Key Responsibilities :
- Define scalable architecture for data ingestion, ETL/ELT pipelines, data processing, analytics, and data science workflows.
- Develop reference architectures and solution blueprints for various business and technical use cases.
- Lead the development of robust data pipelines and ETL frameworks using PySpark/Scala and Databricks notebooks.
- Enable streaming and batch data processing using Apache Spark on Databricks.
- Collaborate with DevOps teams to implement CI/CD pipelines for Databricks workloads using tools like GitHub, Azure DevOps, or Jenkins.
- Optimize Databricks clusters, Spark jobs, and data workflows for performance, scalability, and cost efficiency.
- Implement caching, partitioning, Z-Ordering, and data compaction strategies on Delta Lake
- Define and implement data governance standards using Unity Catalog, role-based access control (RBAC), and data lineage tracking.
- Ensure data compliance and security policies are enforced across data pipelines and storage layers.
- Maintain metadata catalogs and ensure data quality and observability across the pipeline.
- Engage with business analysts, data scientists, product owners, and solution architects to gather requirements and translate them into technical solutions.
- Present architectural solutions and recommendations to senior leadership and cross-functional teams.
- Provide technical guidance and mentorship to data engineers and junior architects.
- Conduct code reviews, enforce coding standards, and foster a culture of engineering excellence.
Required Qualifications :
Technical Skills :
- Expert-level knowledge of Databricks, including Delta Lake, Unity Catalog, MLflow, and Workflows.
- Strong hands-on experience with Apache Spark, especially using PySpark or Scala.
- Proficient in building and maintaining ETL/ELT pipelines in a large-scale distributed environment.
- In-depth understanding of cloud platforms AWS (with S3, Glue, EMR), Azure (with ADLS, Synapse), or GCP (with BigQuery, Dataflow).
- Familiarity with SQL and data modeling techniques for both OLAP and OLTP systems
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1548082
Interview Questions for you
View All