HamburgerMenu
hirist

Job Description

Role Summary :

We are looking for an experienced Data Engineer to design, build, and optimize large-scale data processing systems.

The ideal candidate will have strong expertise in Python programming, Apache Spark for big data processing, and Apache Airflow for workflow orchestration.

This role requires a deep understanding of data architecture principles and best practices to enable scalable, reliable, and efficient data pipelines that support advanced analytics and business intelligence.


Key Responsibilities :

- Design and implement scalable data architectures and pipelines leveraging Apache Spark and Python to process large datasets efficiently.

- Develop and maintain workflow orchestration solutions using Apache Airflow to automate complex data processing tasks.

- Collaborate with data engineering, data science, and analytics teams to understand data requirements and translate them into robust architectural solutions.

- Define data standards, governance, and best practices to ensure data quality, security, and compliance.

- Optimize data storage and retrieval strategies across data lakes, warehouses, and streaming platforms.

- Evaluate new tools, frameworks, and technologies to improve data platform capabilities.

- Monitor and troubleshoot data pipeline performance issues, ensuring high availability and reliability.

- Document architectural designs, data flows, and operational processes.

- Mentor and guide junior data engineers and analysts on architectural best practices and technical skills.

- Work closely with cloud platform teams to integrate and deploy data solutions (e., AWS, Azure, or GCP).


Required Skills & Experience :

- Strong programming skills in Python for data processing, scripting, and automation.

- Hands-on expertise in Apache Spark for batch and stream processing of large-scale datasets.

- Experience designing and managing workflows with Apache Airflow, including DAG development and monitoring.

- Solid understanding of data architecture principles, ETL/ELT design patterns, and distributed computing.

- Experience with relational and NoSQL databases, data lakes, and data warehouses.

- Familiarity with cloud data services and infrastructure (AWS, Azure, or GCP).

- Proficient in SQL and data modeling techniques.

- Knowledge of data governance, security practices, and compliance standards.

- Strong analytical, problem-solving, and communication skills.

- Ability to work collaboratively in cross-functional teams and mentor junior members.


Preferred Qualifications :

- Bachelors or Masters degree in Computer Science, Engineering, or related field.

- Experience with containerization and orchestration tools (Docker, Kubernetes).

- Familiarity with other data processing frameworks (Kafka, Flink).

- Exposure to machine learning pipelines and MLOps practices.

- Certifications related to Big Data, Cloud, or Data Architecture


info-icon

Did you find something suspicious?