Key Responsibilities :

- Build and optimise data ingestion, transformation, and integration pipelines across multiple sources - clinical trials, EHR/EMR, laboratory systems, and commercial platforms.

- Implement data lakes and data warehouses using modern cloud technologies (Azure, AWS, or GCP).

- Develop and manage ETL/ELT workflows using tools such as Databricks, Azure Data Factory, or AWS Glue.

- Ensure data quality, lineage, and governance aligned with compliance frameworks (HIPAA, GxP, GDPR).

- Collaborate with data scientists and analytics teams to create reusable data models and feature stores.

- Optimise data access and performance for analytical workloads and visualisation tools.

- Automate deployments and monitoring using DevOps pipelines (Git, Jenkins, Azure DevOps).

Required Skills & Experience :

- Minimum 3 years of experience in data engineering or related roles.

- Strong programming expertise in Python, PySpark, or Scala.

- Proven experience with SQL and big-data frameworks (Spark, Hadoop, Kafka).

- Hands-on experience with cloud-based data platforms - Azure Data Factory, Databricks, AWS Glue, Snowflake, or GCP Dataflow.

- Solid understanding of data modelling techniques (Star, Snowflake, Dimensional).

- Exposure to Life Sciences / Pharma datasets such as clinical trials, bioinformatics, or patient data models (CDISC, HL7, FHIR).

- Knowledge of data security and compliance in regulated environments.

Good to Have :

- Experience with real-world evidence (RWE) or pharma commercial analytics datasets.

- Familiarity with machine learning data preparation pipelines.

- Knowledge of data visualization tools (Power BI, Tableau).

- Cloud or data engineering certifications (Azure, AWS, GCP, Snowflake).