Posted on: 13/11/2025
Description :
Responsibilities :
- Design, develop, and maintain data pipelines and ETL processes using Databricks and PySpark.
- Work extensively with Apache Hive for data querying, transformations, and integration with big data systems.
- Write and optimise complex SQL queries for data extraction, transformation, and reporting.
- Implement data ingestion and transformation workflows across multiple data sources.
- Collaborate with data analysts, data scientists, and business teams to deliver reliable and scalable data solutions.
- Develop and optimise data models for analytics, reporting, and machine learning use cases.
- Ensure data quality, performance, and governance across all data pipelines.
- Monitor, troubleshoot, and optimise existing data processes for performance and reliability.
- Work with cloud-based data platforms (Azure / AWS / GCP) and integrate Databricks environments.
- Document technical designs, data flows, and architecture for ongoing maintenance.
Requirements :
- 5+ years of hands-on experience as a Data Engineer in enterprise-scale data environments.
- Databricks - Must Have (Expert Level).
- PySpark - Must Have (Expert Level).
- SQL (especially for Apache Hive) - Must Have (Expert Level).
- Apache Hive - Must Have (Basic Knowledge).
- Hadoop - Good to Have.
- Data Modelling - Good to Have.
- Strong understanding of ETL/ELT pipelines, data warehousing, and distributed computing frameworks.
- Familiarity with version control (Git) and CI/CD for data workflows.
- Good understanding of cloud data architectures (Azure Data Lake, AWS S3 etc. ).
- Excellent problem-solving, debugging, and communication skills.
- Experience with Airflow, Azure Data Factory, or similar orchestration tools.
- Exposure to machine learning pipelines or real-time data streaming (Kafka, Spark Streaming).
- Understanding of data governance, lineage, and cataloguing tools.
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1573465