HamburgerMenu
hirist

Data Engineer - ETL/PySpark

Acquire Bright Minds
Bangalore
6 - 10 Years

Posted on: 22/08/2025

Job Description

About the Role :


We are actively hiring a Data Engineer (SDE2 level) with strong expertise in Core Python, PySpark, and Big Data technologies to join a high-performance data engineering team working on large-scale, real-time data platforms.


This role is ideal for professionals with solid foundational knowledge in object-oriented programming (OOP) in Python, hands-on experience with distributed data processing using PySpark, and familiarity with Hadoop ecosystem tools like Hive, HDFS, Oozie, and Yarn.


As part of a dynamic and collaborative engineering team, you will build robust data pipelines, optimize big data workflows, and work on scalable solutions that support analytics, data science, and downstream applications.


Key Responsibilities :


Data Engineering & Development :


- Design, develop, and maintain scalable and efficient ETL pipelines using Core Python and PySpark.

- Work with structured and semi-structured data from various sources and design pipelines to process large datasets in batch and near real-time.

- Build and optimize Hive queries, manage HDFS data storage, and schedule workflows using Oozie and Yarn.

- Integrate various data sources and ensure clean, high-quality data availability for downstream systems (analytics, BI, ML models, etc.).


Object-Oriented Programming in Python :


- Implement clean, modular, and reusable Python code with strong understanding of OOP principles.

- Debug, test, and optimize existing code and actively participate in peer reviews and design discussions.


Design & Architecture :


- Participate in design and architectural discussions related to big data platform enhancements.

- Apply software engineering principles such as modularity, reusability, and scalability.

- Write well-documented, maintainable, and testable code aligned with best practices and performance standards.


Database & Query Optimization :


- Work with SQL-based tools (Hive/Presto/SparkSQL) to write and optimize complex queries.

- Experience in data modeling, partitioning strategies, and query performance tuning is required.


Cloud Integration (Bonus) :


- Exposure to cloud platforms such as Azure or AWS is a plus.

- Understanding of cloud storage, data lakes, and cloud-based ETL workflows is advantageous.


- Hands-on expertise with PySpark and distributed data processing

- Good working knowledge of SQL (HiveQL, SparkSQL)

Experience with Hadoop ecosystem :

- Hive

- HDFS

- Oozie

- Yarn

Experience with data ingestion, transformation, and optimization techniques


Good to Have (Bonus Skills) :


- Familiarity with CI/CD pipelines and version control (Git)

- Experience with Airflow, Kafka, or other orchestration/streaming tools

- Exposure to containerization (Docker) and job scheduling tools

- Cloud experience with AWS (S3, Glue, EMR) or Azure (ADF, Blob, Synapse)


What Were Looking For :


- 610 years of relevant experience in data engineering or backend development

- Strong problem-solving skills with a keen attention to detail

- Ability to work independently and within a collaborative team environment

- Passion for clean, maintainable code and scalable design

- Excellent communication and interpersonal skills


Why Join Us ?


- Work on real-time big data platforms at scale

- Be part of a fast-growing team solving complex data challenges

- Opportunities to grow into architectural or lead roles

- Hybrid work culture with flexibility and ownership


info-icon

Did you find something suspicious?