Key Responsibilities :

- Design, develop, and maintain large-scale distributed data pipelines using Apache Spark and Scala.

- Write clean, efficient, and maintainable Scala code adhering to industry best practices.

- Implement Spark Core, Spark SQL, and Spark Streaming modules for real-time and batch data processing.

- Collaborate with cross-functional teams to gather and understand data processing requirements.

- Optimize performance of complex queries and processing logic in Hadoop ecosystems.

- Develop and manage workflows using UNIX shell scripting, Hive, Sqoop, and Impala.

- Participate actively in Agile/Scrum ceremonies-daily stand-ups, sprint planning, retrospectives, etc.

- Provide production support and maintenance for existing applications, ensuring high availability and performance.

- Conduct root cause analysis and resolve data pipeline issues in collaboration with upstream/downstream teams.

- Stay updated with the latest Big Data technologies

Mandatory Skills :

- Strong proficiency in Scala with at least 8+ years of hands-on experience.

- Solid experience with Apache Spark for building distributed data processing applications.

- In-depth understanding of data structures, algorithms, and design patterns.

- Strong command over SQL and working knowledge of NoSQL databases.