About the Role :

We are seeking a highly skilled Data Engineer to join our team to design, build, and optimize scalable data pipelines and platforms.

The ideal candidate will have hands-on experience with Python, Spark, HDFS, and MongoDB, and a proven ability to work with large-scale datasets in a distributed environment.

Key Responsibilities :

- Design, develop, and maintain end-to-end data pipelines for batch and real-time processing.

- Work with Apache Spark to process and transform large datasets efficiently.

- Manage and optimize HDFS storage, ensuring data availability, reliability, and performance.

- Develop scripts and data orchestration workflows using Python.

- Build and maintain NoSQL data solutions using MongoDB, including data modeling and performance tuning.

- Collaborate with Data Scientists, Analysts, and Platform Engineering teams to deliver high-quality data solutions.

- Implement data quality, validation, and monitoring frameworks to ensure accuracy and consistency.

- Participate in design reviews, code reviews, and performance optimization initiatives.

- Contribute to the continuous improvement of data engineering standards and best practices.

Required Skills & Qualifications :

- Bachelors or Masters degree in Computer Science, Information Technology, Data Engineering or related field.

- 3+ years of hands-on experience in Data Engineering or related domain.

- Strong proficiency in Python programming for data processing and automation.

- Expertise in Apache Spark (PySpark preferred) for large-scale data processing.

- Solid experience with HDFS (Hadoop Distributed File System) and distributed data architecture.

- Hands-on experience with MongoDB including schema design, queries, and performance optimization.

- Good understanding of ETL concepts, data warehousing, and data modeling.

- Proficient in working with Linux/Unix environments and shell scripting.

- Experience with version control tools like Git.

Good to Have (Optional) :

- Experience with workflow orchestration tools (Airflow, Luigi, Oozie, etc.)

- Knowledge of cloud platforms (AWS, Azure, GCP) and cloud-native data services

- Exposure to CI/CD and DevOps practices for data engineering

- Experience with streaming systems (Kafka, Flink, etc.)