Exp : 8 - 10+ years

Skillset :

- Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON).

- Own the full lifecycle of core pipelines - from file ingestion to validated, queryable datasets - ensuring high reliability and performance.

- Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability.

- Refactor and scale existing pipelines to meet growing data and business needs.

- Tune Spark jobs and optimize distributed processing performance.

- Implement schema enforcement and versioning aligned with internal data standards.

- Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs.

- Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues.

- Contribute to the evolution of our data platform - driving toward mature patterns in observability, testing, and automation.

- Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs.

- Help develop and champion internal best practices around pipeline development and data modeling.

Experience :

- 8 - 10 years of experience as a Data Engineer (or equivalent), building production-grade pipelines.

- Strong expertise in Python, Spark SQL, and Airflow.

- Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.

- Experience mapping and standardizing raw external data into canonical models.

- Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.

- Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.

- Strong written and verbal communication skills - able to explain technical concepts to non-engineering partners.

- Comfortable designing pipelines from scratch and improving existing pipelines.

- Experience working with large-scale or messy datasets (healthcare, financial, logs, etc.).

- Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.

- Bonus : Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).