HamburgerMenu
hirist

Job Description

Description :


Responsibilities :


- Design, develop, and maintain data pipelines and ETL processes using Databricks and PySpark.


- Work extensively with Apache Hive for data querying, transformations, and integration with big data systems.


- Write and optimise complex SQL queries for data extraction, transformation, and reporting.


- Implement data ingestion and transformation workflows across multiple data sources.


- Collaborate with data analysts, data scientists, and business teams to deliver reliable and scalable data solutions.


- Develop and optimise data models for analytics, reporting, and machine learning use cases.


- Ensure data quality, performance, and governance across all data pipelines.


- Monitor, troubleshoot, and optimise existing data processes for performance and reliability.


- Work with cloud-based data platforms (Azure / AWS / GCP) and integrate Databricks environments.


- Document technical designs, data flows, and architecture for ongoing maintenance.


Requirements :


- 5+ years of hands-on experience as a Data Engineer in enterprise-scale data environments.


- Databricks - Must Have (Expert Level).


- PySpark - Must Have (Expert Level).


- SQL (especially for Apache Hive) - Must Have (Expert Level).


- Apache Hive - Must Have (Basic Knowledge).


- Hadoop - Good to Have.


- Data Modelling - Good to Have.


- Strong understanding of ETL/ELT pipelines, data warehousing, and distributed computing frameworks.


- Familiarity with version control (Git) and CI/CD for data workflows.


- Good understanding of cloud data architectures (Azure Data Lake, AWS S3 etc. ).


- Excellent problem-solving, debugging, and communication skills.


- Experience with Airflow, Azure Data Factory, or similar orchestration tools.


- Exposure to machine learning pipelines or real-time data streaming (Kafka, Spark Streaming).


- Understanding of data governance, lineage, and cataloguing tools.


info-icon

Did you find something suspicious?