Role Overview :

As a Spark Software Engineer you will be instrumental in designing, developing, and maintaining robust data pipelines and scalable data processing solutions for our clients. You will collaborate closely with data scientists, data engineers, and business stakeholders to understand their requirements and translate them into efficient and reliable data workflows. Your work will directly impact the quality, accessibility, and usability of data, enabling our clients to gain valuable insights and improve their business performance.

Key Responsibilities :

- Design and implement scalable and reliable data pipelines using Spark, Python, Java, and Scala to ingest, process, and transform large datasets for various analytical and reporting needs.

- Develop and maintain data quality checks and monitoring systems using tools like Hive and Airflow to ensure data accuracy and consistency across different stages of the data pipeline.

- Collaborate with data scientists and business analysts to understand data requirements and translate them into efficient and optimized Spark jobs.

- Optimize Spark applications for performance and scalability by tuning configurations, partitioning data effectively, and leveraging appropriate data storage formats.

- Contribute to the development of data engineering best practices and standards to ensure code quality, maintainability, and reusability across the team.

- Evaluate and integrate new data processing technologies like Flink to enhance our data engineering capabilities and address evolving business needs.

- Communicate effectively with stakeholders to provide updates on project progress, address technical challenges, and ensure alignment on project goals.

Required Skillset :

- Demonstrated ability to design, develop, and deploy scalable data pipelines using Spark, Python, Java, and Scala.

- Proven experience in data modeling, data warehousing, and data lake concepts.

- Strong understanding of data quality principles and experience implementing data quality checks and monitoring systems.

- Excellent communication and collaboration skills, with the ability to effectively communicate technical concepts to both technical and non-technical audiences.

- Ability to work independently and as part of a team in a fast-paced, dynamic environment.

- Experience with workflow orchestration tools like Airflow.

- Experience with distributed data storage systems like Hive.