Description :

Competencies :

- Strong analytical and problem-solving skills

- Ability to work independently and in a team-oriented environment

- Effective communication and collaboration skills

- Attention to detail and commitment to data accuracy

- Ability to handle high-volume data and tight delivery timelines

- Proactive mindset with a focus on continuous improvement

Job Description :

We are looking for an experienced IT Professional with strong expertise in Python, Apache Spark, Unix, and Hive to design, develop, and support large-scale data processing solutions. The role involves working on data pipelines, performance optimization, and analytics platforms in a distributed environment while collaborating with cross-functional teams.

Key Responsibilities :

- Design, develop, and maintain scalable data processing applications using Python and Apache Spark

- Build and optimize ETL/ELT pipelines for large datasets

- Work extensively on Unix/Linux environments for scripting, automation, and job scheduling

- Develop and optimize Hive queries for data analysis and reporting

- Ensure data quality, reliability, and performance across data platforms

- Troubleshoot and resolve production issues related to data pipelines and batch processing

- Collaborate with data analysts, data scientists, and business teams to understand data requirements

- Participate in code reviews, performance tuning, and best practice implementation

- Document technical designs, workflows, and operational procedures

Must Have Skills :

- Strong hands-on experience with Python for data processing and scripting

- Extensive experience with Apache Spark (Core, SQL, DataFrames)

- Solid knowledge of Unix/Linux commands, shell scripting, and job automation

- Strong experience with Hive, including query optimization and partitioning

- Experience working with large-scale data environments and distributed systems

- Good understanding of data structures, ETL concepts, and data warehousing principles

Good to Have Skills :

- Experience with Spark Streaming or real-time data processing

- Knowledge of HDFS, Kafka, Airflow, Oozie, or similar tools

- Exposure to cloud platforms (AWS / Azure / GCP)

- Experience with SQL performance tuning

- Familiarity with CI/CD pipelines and version control tools like Git