We are actively hiring a Data Engineer (SDE2 level) with strong expertise in Core Python, PySpark, and Big Data technologies to join a high-performance data engineering team working on large-scale, real-time data platforms.

This role is ideal for professionals with solid foundational knowledge in object-oriented programming (OOP) in Python, hands-on experience with distributed data processing using PySpark, and familiarity with Hadoop ecosystem tools like Hive, HDFS, Oozie, and Yarn.

As part of a dynamic and collaborative engineering team, you will build robust data pipelines, optimize big data workflows, and work on scalable solutions that support analytics, data science, and downstream applications.

Key Responsibilities :

Data Engineering & Development :

- Design, develop, and maintain scalable and efficient ETL pipelines using Core Python and PySpark.

- Work with structured and semi-structured data from various sources and design pipelines to process large datasets in batch and near real-time.

- Build and optimize Hive queries, manage HDFS data storage, and schedule workflows using Oozie and Yarn.

- Integrate various data sources and ensure clean, high-quality data availability for downstream systems (analytics, BI, ML models, etc.).

Object-Oriented Programming in Python :

- Implement clean, modular, and reusable Python code with strong understanding of OOP principles.

- Debug, test, and optimize existing code and actively participate in peer reviews and design discussions.

Design & Architecture :

- Participate in design and architectural discussions related to big data platform enhancements.

- Apply software engineering principles such as modularity, reusability, and scalability.

- Write well-documented, maintainable, and testable code aligned with best practices and performance standards.

Database & Query Optimization :

- Work with SQL-based tools (Hive/Presto/SparkSQL) to write and optimize complex queries.

- Experience in data modeling, partitioning strategies, and query performance tuning is required.

Cloud Integration (Bonus) :

- Exposure to cloud platforms such as Azure or AWS is a plus.

- Understanding of cloud storage, data lakes, and cloud-based ETL workflows is advantageous.

- Hands-on expertise with PySpark and distributed data processing

- Good working knowledge of SQL (HiveQL, SparkSQL)

Experience with Hadoop ecosystem :

- Hive

- HDFS

- Oozie

- Yarn

Experience with data ingestion, transformation, and optimization techniques

Good to Have (Bonus Skills) :

- Familiarity with CI/CD pipelines and version control (Git)

- Experience with Airflow, Kafka, or other orchestration/streaming tools

- Exposure to containerization (Docker) and job scheduling tools

- Cloud experience with AWS (S3, Glue, EMR) or Azure (ADF, Blob, Synapse)

What Were Looking For :

- 610 years of relevant experience in data engineering or backend development

- Strong problem-solving skills with a keen attention to detail

- Ability to work independently and within a collaborative team environment

- Passion for clean, maintainable code and scalable design

- Excellent communication and interpersonal skills

Why Join Us ?

- Work on real-time big data platforms at scale

- Be part of a fast-growing team solving complex data challenges

- Opportunities to grow into architectural or lead roles

- Hybrid work culture with flexibility and ownership