In this role, you will be instrumental in designing, developing, and maintaining ETL processes to ensure efficient extraction, transformation, and loading of data from various sources into data lake and data warehouse.

You will work closely with data engineers, data scientists, and business intelligence teams to build and optimize data workflows that support the project's analytics and reporting needs.

Key Responsibilities :

ETL Development :

- Design and develop ETL processes using Databricks PySpark to extract, transform, and load data from heterogeneous sources into our data lake and data warehouse.

- Optimize ETL workflows for performance and scalability, leveraging Databricks PySpark and Spark SQL to efficiently process large data volumes.

- Implement robust error handling and monitoring mechanisms to proactively detect and resolve issues within ETL processes.

- Design and implement data solutions following the Medallion Architecture principles, organizing data into Bronze, Silver, and Gold layers.

- Ensure data is appropriately cleansed, enriched, and optimized at each stage to support robust analytics and reporting.

Data Pipeline Management :

- Hands On experience in creating advanced data pipelines using databricks workflows.

- Develop and maintain data pipelines using Databricks PySpark, ensuring data quality, integrity, and reliability throughout the ETL lifecycle.

- Collaborate with data engineering, data science, and business intelligence teams to translate data requirements into efficient ETL workflows and pipelines.

Data Analysis and Query Optimization :

- Write and optimize complex SQL queries for data manipulation, aggregation, and analysis within Databricks PySpark applications.

Project Coordination and Continuous Improvement :

- Participate in project planning and coordination activities to ensure timely delivery of ETL solutions.

- Stay updated on the latest developments in Databricks PySpark, Spark SQL, and related technologies, recommending and implementing best practices and optimizations.

- Document ETL processes, data lineage, and metadata to facilitate knowledge sharing and ensure compliance with data governance standards.

Required Qualifications :

- Bachelor's degree in Computer Science, Engineering, or a related field.

- Minimum of 3 years of experience as an ETL developer, with a strong focus on Databricks PySpark development.

- Proficiency in Python programming, with extensive experience in developing and debugging Databricks

PySpark applications.

- In-depth understanding of Spark architecture and internals, with hands-on experience in Spark RDDs, DataFrames, and Spark SQL.

- Expertise in writing and optimizing complex SQL queries for data manipulation, aggregation, and analysis.

- Proven experience in working with large-scale data warehousing and ETL frameworks.

- Strong problem-solving skills and the ability to troubleshoot and resolve ETL process issues.

- Excellent communication and collaboration skills, with the ability to work effectively in a team environment.

Preferred Qualifications :

- Experience with cloud platforms with a preference for AWS.

- Experience with data platform tools such as DataBricks, Snowflake, and Tableau.

- Demonstrated ability to implement best practices for ETL processes and data management.

- Strong understanding of data governance and data quality principles.

- Relevant certifications in Databricks PySpark, Spark SQL, or related technologies.

Did you find something suspicious?

Posted By

Sonal Thakre

HR at Vsquare Systems Pvt. Ltd.

Last Active: 4 Nov 2025

Job Views:
120

Applications: 133

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1547161

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers