We are seeking a highly skilled Data Engineer with expertise in Databricks and AWS Cloud to design, build, and optimize enterprise-scale ETL pipelines and reporting solutions. This role is ideal for someone who thrives at the intersection of data engineering and business intelligence, with a strong focus on transforming complex datasets into inputs for actionable insights. You will play a critical role in enabling spend analytics, patient access, and commercial reporting initiatives.

Key Responsibilities :

Databricks ETL Development :

- Design, develop, and maintain scalable ETL pipelines in Databricks using PySpark and SQL.

- Implement robust data ingestion, transformation, and validation processes to ensure high-quality datasets for analytics.

- Optimize workflows for performance, scalability, and reliability across large healthcare datasets.

Data Product Build :

- Transform AWS assets (on EC2 and Redshift) to Databricks by recreating the running ETL jobs and orchestrating via Airflow or Databricks Workflows.

- Hands-on working experience of catalog management using Unity Catalog.

- Conversant with AI capabilities within AWS and Databricks.

Requirements :

- 2 to 7 years of experience in Databricks data engineering roles.

- Strong hands-on proficiency in Databricks (PySpark, SQL) and AWS Cloud.

- Proven track record in ETL pipeline development and understanding of BI dashboards.

- Experience with US healthcare datasets.

- Strong SQL skills for data extraction, aggregation, and reporting.

- Excellent problem-solving abilities, with the capacity to work independently and in cross-functional teams.

- Detail-oriented with a commitment to delivering high-quality, reliable data solutions.

Good to Have :

Domain & Data Knowledge :

- Knowledge of IQVIA claims (standard, eLaaD, Remit, Rejection, NBRx, TRx, IGG4 Claims) and MMIT datasets to derive actionable insights.

End-to-End Production Deployment & Orchestration :

- Candidates with exposure to deploying and orchestrating data pipelines in production environments will be preferred. While not a core requirement, the following experience is a strong differentiator :

- Experience deploying ETL or ML pipelines end-to-end in a production environment, including environment promotion across dev, staging, and production.

- Familiarity with CI/CD tooling (GitHub Actions, Azure DevOps, or Jenkins) for automating pipeline deployment and release management.

- Exposure to Databricks Asset Bundles (DABs) or equivalent frameworks for version-controlled, repeatable job deployments.

- Working knowledge of Apache Airflow for DAG authoring, scheduling, dependency management, and monitoring of production workflows.

- Awareness of infrastructure-as-code practices (Terraform or AWS CloudFormation) for managing cloud resources supporting data pipelines.

- Basic understanding of containerization concepts (Docker) for packaging and deploying data pipeline components.

- Experience with pipeline monitoring and alerting tracking job health, SLA adherence, failure notifications, and data freshness in production.

- Familiarity with secrets and configuration management across environments (AWS Secrets Manager, Databricks Secrets, or equivalent).

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Rupsa

Senior Recruiter at Quantzig

Last Active: 29 Apr 2026

Job Views:
107

Applications: 61

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1632122

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers