We are seeking three skilled and motivated Databricks Engineers to significantly contribute to our data engineering initiatives within the pharmaceutical and life sciences domain.

This role demands strong, hands-on expertise in Databricks, Python, and SQL to design, build, and optimize scalable, robust data pipelines. The Engineer will be responsible for transforming complex, industry-specific datasets, ensuring data quality and accessibility, and enabling critical analytics that drive business intelligence and compliance efforts.

Core Responsibilities :

Databricks and Pipeline Development :

- Design, implement, and maintain high-performance, scalable ETL/ELT data pipelines primarily utilizing Databricks (Delta Lake, Databricks SQL, and Workspace features).

- Leverage strong hands-on experience with Databricks to manage large-scale data processing workloads using Spark clusters and optimizing job execution for efficiency and cost.

- Develop and maintain data transformation logic using Databricks Notebooks written in Python and/or SQL.

Programming and Data Management :

- Apply solid programming skills in Python (including libraries like Pandas and PySpark) for complex data manipulation, cleansing, validation, and automation of data ingestion workflows.

- Utilize proficiency in SQL for ad-hoc data querying, complex transformations, stored procedure logic, and effective troubleshooting of data discrepancies across the data lakehouse.

- Implement and enforce data governance and quality checks within the pipeline to ensure the accuracy and reliability of all downstream data assets.

Domain and Compliance Focus:

- Apply proven experience working with pharmaceutical or life sciences data, including familiarity with industry-specific data structures (e.g., clinical trials, patient data, R&D data) and standards.

- Ensure all data solutions adhere to relevant compliance considerations and regulatory standards specific to the pharmaceutical domain.

Collaboration and Support:

- Collaborate effectively with data scientists, BI developers, and cross-functional teams to understand data needs and ensure high data accessibility and performance.

- Participate in code reviews, contribute to technical documentation, and provide production support for data pipelines in a fast-paced environment.

Required Skills and Experience:

- 35 years of dedicated experience in data engineering or a closely related role.

- Strong hands-on expertise with Databricks for data processing, pipeline development, and managing Delta Lake architecture.

- Proficiency in SQL for complex querying, data transformation, and troubleshooting data issues.

- Solid programming skills in Python for data manipulation, scripting, and automation.

- Proven experience working with pharmaceutical or life sciences data, including familiarity with industry data structures and compliance considerations.

- Experience with cloud platforms (e.g., AWS, Azure) and associated data services.

Preferred Skills :

- Hands-on experience with Delta Live Tables (DLT) for declarative pipeline implementation.

- Familiarity with database version control and migration tools.

- Experience with CI/CD implementation for Databricks jobs.

- Knowledge of advanced data governance practices and tools.

- Understanding of statistical modeling and machine learning concepts in a life sciences context.