Posted on: 16/09/2025
Job Description :
In this role, you will be instrumental in designing, developing, and maintaining ETL processes to ensure efficient extraction, transformation, and loading of data from various sources into data lake and data warehouse.
You will work closely with data engineers, data scientists, and business intelligence teams to build and optimize data workflows that support the project's analytics and reporting needs.
Key Responsibilities :
ETL Development :
- Design and develop ETL processes using Databricks PySpark to extract, transform, and load data from heterogeneous sources into our data lake and data warehouse.
- Optimize ETL workflows for performance and scalability, leveraging Databricks PySpark and Spark SQL to efficiently process large data volumes.
- Implement robust error handling and monitoring mechanisms to proactively detect and resolve issues within ETL processes.
- Design and implement data solutions following the Medallion Architecture principles, organizing data into Bronze, Silver, and Gold layers.
- Ensure data is appropriately cleansed, enriched, and optimized at each stage to support robust analytics and reporting.
Data Pipeline Management :
- Develop and maintain data pipelines using Databricks PySpark, ensuring data quality, integrity, and reliability throughout the ETL lifecycle.
- Collaborate with data engineering, data science, and business intelligence teams to translate data requirements into efficient ETL workflows and pipelines.
Data Analysis and Query Optimization :
Project Coordination and Continuous Improvement :
- Stay updated on the latest developments in Databricks PySpark, Spark SQL, and related technologies, recommending and implementing best practices and optimizations.
- Document ETL processes, data lineage, and metadata to facilitate knowledge sharing and ensure compliance with data governance standards.
Required Qualifications :
- Minimum of 3 years of experience as an ETL developer, with a strong focus on Databricks PySpark development.
- Proficiency in Python programming, with extensive experience in developing and debugging Databricks
PySpark applications.
- In-depth understanding of Spark architecture and internals, with hands-on experience in Spark RDDs, DataFrames, and Spark SQL.
- Expertise in writing and optimizing complex SQL queries for data manipulation, aggregation, and analysis.
- Proven experience in working with large-scale data warehousing and ETL frameworks.
- Strong problem-solving skills and the ability to troubleshoot and resolve ETL process issues.
- Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
Preferred Qualifications :
- Experience with data platform tools such as DataBricks, Snowflake, and Tableau.
- Demonstrated ability to implement best practices for ETL processes and data management.
- Strong understanding of data governance and data quality principles.
- Relevant certifications in Databricks PySpark, Spark SQL, or related technologies.
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1547161
Interview Questions for you
View All