Role Overview :

We are hiring an experienced Senior Data Engineer with a strong background in Pharma / Life Sciences data engineering to design, build, and optimize large-scale data pipelines and analytics platforms.

The role requires deep hands-on expertise in PySpark, Python, AWS, SQL, and modern data engineering practices, with proven experience working on pharma commercial, clinical, or patient data.

Key Responsibilities :

- Design, develop, and maintain scalable data pipelines using PySpark and Python for large, complex pharma datasets.

- Build and optimize ETL/ELT workflows to process structured and unstructured life sciences data.

- Work extensively with AWS services to develop cloud-native data engineering solutions.

- Develop and optimize complex SQL queries for data transformation, validation, and reporting.

- Ensure high standards of data quality, data governance, and compliance, aligned with pharma and regulatory requirements.

- Collaborate closely with data scientists, analytics teams, and business stakeholders to enable advanced analytics and ML use cases.

- Implement performance tuning, monitoring, and optimization for Spark jobs and data pipelines.

- Follow best practices for version control, CI/CD, and agile delivery using tools such as Git and Jira.

- Provide technical leadership and mentor junior and mid-level data engineers.

- Own delivery timelines and ensure production-grade reliability of data platforms.

Mandatory Skills & Qualifications :

Technical Skills :

- 8+ years of overall experience in Data Engineering.

- Strong hands-on experience with PySpark for large-scale data processing.

- Advanced proficiency in Python for data engineering and pipeline development.

- Strong experience with AWS cloud services (e., S3, EC2, EMR, Glue, Redshift, Lambda).

- Excellent command of SQL, including performance tuning and complex query optimization.

- Solid understanding of data modeling, distributed systems, and big data architectures.

Domain Expertise (Mandatory) :

- Proven experience working in Pharma / Life Sciences domain.

- Hands-on exposure to commercial pharma data, clinical data, patient data, or real-world evidence (RWE).

- Strong understanding of pharma data standards, compliance, and governance requirements.

Preferred / Good to Have :

- Experience with workflow orchestration tools (Airflow, Step Functions).

- Exposure to data warehousing and lakehouse architectures.

- Familiarity with Spark performance tuning and cost optimization on AWS.

- Experience supporting analytics, BI, or machine learning workloads.

- Knowledge of healthcare/pharma regulations and data privacy standards.

Soft Skills :

- Strong analytical and problem-solving skills.

- Self-driven with the ability to work in a fast-paced, delivery-focused environment.

- Excellent communication and stakeholder management skills.

- Proven ability to mentor teams and drive technical excellence