HamburgerMenu
hirist

Job Description

Key Responsibilities :

- Data Pipeline Development : Design, develop, and maintain scalable data pipelines using Databricks, Apache Spark, and other relevant technologies.


- ETL Processes : Implement and optimize Extract, Transform, Load (ETL) processes for data ingestion, transformation, and loading into various data storage systems.

- Data Storage Management : Integrate and manage large datasets using cloud storage solutions like Azure Data Lake, Amazon S3, or Delta Lake.

- Performance Optimization : Optimize data processing workflows for performance and scalability, ensuring efficient data processing and storage.

- Collaboration : Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver data solutions that meet business needs.

- Data Governance : Implement and enforce data governance policies, access controls, and security protocols within the Databricks environment, including the use of Unity Catalog.

- Data Quality : Ensure data quality and integrity throughout the ETL processes and data pipelines.

- Troubleshooting : Identify and resolve data-related issues, including performance bottlenecks and data quality problems.


- Documentation : Create and maintain documentation for data pipelines, data models, and data governance procedures.

- Code Management : Utilize version control systems like Git for code management and collaboration.


Skills Required :


- Apache Spark : Strong proficiency in Apache Spark for large-scale data processing and analysis.

- Python : Expertise in Python for scripting, data manipulation, and automation.

- Databricks Platform : Experience with the Databricks platform, including Databricks notebooks, workflows, and Delta Lake.

- SQL : Proficiency in SQL for data querying, manipulation, and database management.

- ETL : Strong understanding of ETL processes and best practices.

- Data Modeling : Experience with data modeling and schema design.

- Cloud Technologies : Familiarity with cloud platforms like Azure, AWS, or GCP.

- Data Governance : Knowledge of data governance principles and practices.

- Collaboration & Communication : Ability to work effectively in a team and communicate technical information clearly.


info-icon

Did you find something suspicious?