Data Engineer

Key Responsibilities :

- Design and implement enterprise-grade data warehouse solutions, including developing dimensional and normalized data models from scratch to support reporting, analytics, and machine learning use cases

- Build and maintain star and snowflake schemas, slowly changing dimensions (SCD), and other data modeling patterns aligned with business logic

- Own end-to-end data architecture design, including schema evolution, data quality frameworks, and comprehensive documentation for data models

- Collaborate with analytics and product teams to translate business metrics into data warehouse structures and KPIs

- Design and implement scalable ETL/ELT pipelines for processing large-scale structured and unstructured healthcare datasets across multiple sources

- Build robust data ingestion frameworks to handle real-time and batch data processing from various healthcare systems, IoT devices, and third-party APIs

- Develop and maintain data quality monitoring and validation systems to ensure data integrity and accuracy across the data warehouse

- Implement data governance practices including data lineage tracking, metadata management, and access controls

- Build automated data testing and monitoring pipelines to detect anomalies and ensure SLA compliance

- Collaborate with ML engineers to create feature stores and data pipelines that support machine learning model development and deployment

Required Skills and Experience:

- 5+ years of experience in data engineering with a strong focus on data warehouse design and implementation

- Proficiency in SQL and database technologies (PostgreSQL, MySQL, Snowflake, BigQuery, Redshift)

- Expert-level experience with data processing frameworks including Apache Spark, PySpark, and distributed computing concepts

- Hands-on experience with ETL/ELT tools and frameworks (Apache Airflow, dbt, Fivetran, or similar)

- Strong experience with cloud data platforms (AWS Data Services, GCP BigQuery/Dataflow, or Azure Data Factory)

- Proficiency in Python and/or Scala for data pipeline development and automation

- Experience with data modeling techniques including dimensional modeling, data vault, and normalization strategies

- Understanding of data governance, security, and compliance requirements, especially in healthcare contexts

Preferred Qualifications :

- Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related technical discipline

- Experience with real-time data processing using technologies like Apache Kafka, Kinesis, or Pub/Sub

- Exposure to HealthTech or Healthcare services domain with understanding of healthcare data standards (HL7, FHIR)

- Experience with data lake architectures and modern data stack implementations

- Familiarity with Infrastructure as Code (Terraform, CloudFormation) for data infrastructure management

- Knowledge of containerization technologies (Docker, Kubernetes) for data pipeline deployment