Key Responsibilities :

- Design and implement scalable and reliable data pipelines using PySpark and Azure Databricks to ingest, process, and transform large datasets.

- Develop and maintain data models and schemas to ensure data quality, consistency, and accessibility for downstream applications.

- Build and deploy automated data quality checks and monitoring systems to proactively identify and resolve data issues.

- Collaborate with data scientists and analysts to understand their data requirements and provide them with the necessary data infrastructure and tools.

- Optimize data pipelines for performance and efficiency to ensure timely delivery of data to stakeholders.

- Implement and maintain data security and governance policies to protect sensitive data and ensure compliance with regulatory requirements.

- Contribute to the development of best practices for data engineering and promote a data-driven culture within the organization.

Required Skillset :

- Proven ability to design, develop, and deploy scalable data pipelines using PySpark and Azure Databricks.

- Demonstrated expertise in data modeling techniques and experience with various data warehousing and database technologies.

- Strong proficiency in Python and experience with testing frameworks such as PyTest.

- Excellent problem-solving and analytical skills, with a passion for working with data.

- Ability to communicate effectively with both technical and non-technical audiences.

- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.