Posted on: 21/01/2026
Description :
Competencies :
- Strong analytical and problem-solving skills
- Ability to work independently and in a team-oriented environment
- Effective communication and collaboration skills
- Attention to detail and commitment to data accuracy
- Ability to handle high-volume data and tight delivery timelines
- Proactive mindset with a focus on continuous improvement
Job Description :
We are looking for an experienced IT Professional with strong expertise in Python, Apache Spark, Unix, and Hive to design, develop, and support large-scale data processing solutions. The role involves working on data pipelines, performance optimization, and analytics platforms in a distributed environment while collaborating with cross-functional teams.
Key Responsibilities :
- Design, develop, and maintain scalable data processing applications using Python and Apache Spark
- Build and optimize ETL/ELT pipelines for large datasets
- Work extensively on Unix/Linux environments for scripting, automation, and job scheduling
- Develop and optimize Hive queries for data analysis and reporting
- Ensure data quality, reliability, and performance across data platforms
- Troubleshoot and resolve production issues related to data pipelines and batch processing
- Collaborate with data analysts, data scientists, and business teams to understand data requirements
- Participate in code reviews, performance tuning, and best practice implementation
- Document technical designs, workflows, and operational procedures
Must Have Skills :
- Strong hands-on experience with Python for data processing and scripting
- Extensive experience with Apache Spark (Core, SQL, DataFrames)
- Solid knowledge of Unix/Linux commands, shell scripting, and job automation
- Strong experience with Hive, including query optimization and partitioning
- Experience working with large-scale data environments and distributed systems
- Good understanding of data structures, ETL concepts, and data warehousing principles
Good to Have Skills :
- Experience with Spark Streaming or real-time data processing
- Knowledge of HDFS, Kafka, Airflow, Oozie, or similar tools
- Exposure to cloud platforms (AWS / Azure / GCP)
- Experience with SQL performance tuning
- Familiarity with CI/CD pipelines and version control tools like Git
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1604112