About the Role :
We are actively hiring a Data Engineer (SDE2 level) with strong expertise in Core Python, PySpark, and Big Data technologies to join a high-performance data engineering team working on large-scale, real-time data platforms.
This role is ideal for professionals with solid foundational knowledge in object-oriented programming (OOP) in Python, hands-on experience with distributed data processing using PySpark, and familiarity with Hadoop ecosystem tools like Hive, HDFS, Oozie, and Yarn.
As part of a dynamic and collaborative engineering team, you will build robust data pipelines, optimize big data workflows, and work on scalable solutions that support analytics, data science, and downstream applications.
Key Responsibilities :
Data Engineering & Development :
- Design, develop, and maintain scalable and efficient ETL pipelines using Core Python and PySpark.
- Work with structured and semi-structured data from various sources and design pipelines to process large datasets in batch and near real-time.
- Build and optimize Hive queries, manage HDFS data storage, and schedule workflows using Oozie and Yarn.
- Integrate various data sources and ensure clean, high-quality data availability for downstream systems (analytics, BI, ML models, etc.).
Object-Oriented Programming in Python :
- Implement clean, modular, and reusable Python code with strong understanding of OOP principles.
- Debug, test, and optimize existing code and actively participate in peer reviews and design discussions.
Design & Architecture :
- Participate in design and architectural discussions related to big data platform enhancements.
- Apply software engineering principles such as modularity, reusability, and scalability.
- Write well-documented, maintainable, and testable code aligned with best practices and performance standards.
Database & Query Optimization :
- Work with SQL-based tools (Hive/Presto/SparkSQL) to write and optimize complex queries.
- Experience in data modeling, partitioning strategies, and query performance tuning is required.
Cloud Integration (Bonus) :
- Exposure to cloud platforms such as Azure or AWS is a plus.
- Understanding of cloud storage, data lakes, and cloud-based ETL workflows is advantageous.
- Hands-on expertise with PySpark and distributed data processing
- Good working knowledge of SQL (HiveQL, SparkSQL)
Experience with Hadoop ecosystem :
- Hive
- HDFS
- Oozie
- Yarn
Experience with data ingestion, transformation, and optimization techniques
Good to Have (Bonus Skills) :
- Familiarity with CI/CD pipelines and version control (Git)
- Experience with Airflow, Kafka, or other orchestration/streaming tools
- Exposure to containerization (Docker) and job scheduling tools
- Cloud experience with AWS (S3, Glue, EMR) or Azure (ADF, Blob, Synapse)
What Were Looking For :
- 610 years of relevant experience in data engineering or backend development
- Strong problem-solving skills with a keen attention to detail
- Ability to work independently and within a collaborative team environment
- Passion for clean, maintainable code and scalable design
- Excellent communication and interpersonal skills
Why Join Us ?
- Work on real-time big data platforms at scale
- Be part of a fast-growing team solving complex data challenges
- Opportunities to grow into architectural or lead roles
- Hybrid work culture with flexibility and ownership
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1533336
Interview Questions for you
View All