Epergne Solutions - Data Engineer - Azure Databricks

Epergne Solutions

Multiple Locations

4 - 6 Years

4.3

4+ Reviews

Data Engineering Data Infrastructure Azure Databricks Data Pipeline AWS PySpark SQL Python Data Governance Data Quality

Posted on: 13/07/2025

Job Description

Project Overview :

We are seeking skilled Data Engineers with strong hands-on experience in Databricks for a high-impact data platform project.

The project involves building scalable data pipelines, integrating with cloud platforms, and managing data governance using Unity Catalog.

Location : Noida/Gurgaon/Hyderabad/Bangalore/Pune

Key Requirements :

Primary Skills :

- Expertise in Databricks, working on either AWS or Azure (cloud-specific skills are not mandatory unless specified).

- Proficiency in PySpark, Spark SQL, Python, and Delta Lake.

- Hands-on experience in building data pipelines and cloud integration.

- Familiarity with Databricks Unity Catalog and Federated Catalog.

Cloud Environment :

- Engineers working on Databricks on AWS are not required to have Azure-specific skills.

- Engineers working on Databricks on Azure are not required to have AWS-specific skills.

- Multi-cloud exposure (AWS & Azure) is a plus due to the diverse project environment.

Infrastructure as Code (IaC) :

- Experience with Terraform for infrastructure provisioning and automation.

Testing and Quality :

- Experience in writing unit tests and integration tests using pytest.

Key Responsibilities :

- Data Pipeline Development : Design, develop, and optimize scalable and robust data pipelines using Databricks, PySpark, Spark SQL, and Python.

- Cloud Integration : Integrate data pipelines with either AWS or Azure cloud platforms, depending on the project's specific cloud environment.

- Delta Lake Management : Implement and manage data solutions leveraging Delta Lake for reliable and performant data storage and processing.

- Data Governance : Work with Databricks Unity Catalog and Federated Catalog to ensure data governance, security, and discoverability across the data landscape.

- Infrastructure Automation : Utilize Terraform for provisioning and managing cloud infrastructure related to data platform components.

- Testing and Quality Assurance : Develop and execute comprehensive unit tests and integration tests using pytest to ensure data quality and pipeline reliability.

- Performance Optimization : Identify and address performance bottlenecks in data pipelines and queries to ensure optimal efficiency.

- Collaboration : Work closely with data architects, data scientists, and other engineering teams to understand requirements and deliver high-quality data solutions.

- Documentation : Create and maintain technical documentation for data pipelines, processes, and systems