Posted on: 13/09/2025
Exp : 8 - 10+ years
Skillset :
- Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON).
- Own the full lifecycle of core pipelines - from file ingestion to validated, queryable datasets - ensuring high reliability and performance.
- Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability.
- Refactor and scale existing pipelines to meet growing data and business needs.
- Tune Spark jobs and optimize distributed processing performance.
- Implement schema enforcement and versioning aligned with internal data standards.
- Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs.
- Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues.
- Contribute to the evolution of our data platform - driving toward mature patterns in observability, testing, and automation.
- Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs.
- Help develop and champion internal best practices around pipeline development and data modeling.
Experience :
- 8 - 10 years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
- Strong expertise in Python, Spark SQL, and Airflow.
- Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
- Experience mapping and standardizing raw external data into canonical models.
- Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.
- Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.
- Strong written and verbal communication skills - able to explain technical concepts to non-engineering partners.
- Comfortable designing pipelines from scratch and improving existing pipelines.
- Experience working with large-scale or messy datasets (healthcare, financial, logs, etc.).
- Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
- Bonus : Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1545218