We are looking for a highly skilled Data Software Engineer (L2) with strong expertise in Apache Spark, Python, and Databricks.

You will work on designing and building large-scale data processing systems and ETL pipelines while ensuring optimization and reliability.

Key Responsibilities:

- Design and build distributed data processing systems using Spark and Hadoop

- Develop and optimize Spark applications for performance and scale

- Build ETL/ELT pipelines for ingestion and transformation of large datasets

- Develop and maintain real-time streaming applications using Spark Streaming / Storm

- Work on Kafka or RabbitMQ for message ingestion and event-driven workflows

- Develop and optimize workloads on AWS Databricks or Azure Databricks

- Perform cluster management, job scheduling, automation, and CI/CD enablement

- Integrate data from RDBMS, file systems, ERP sources, and NoSQL databases (HBase, Cassandra, MongoDB)

- Write high-quality code using Python and SQL with a focus on reusability and performance

Required Skills:

- Apache Spark - Expert level (core, SQL, streaming)

- Python - strong hands-on coding

- Hadoop ecosystem - HDFS, MapReduce, Sqoop

- Streaming : Spark Streaming / Storm

- Messaging : Kafka / RabbitMQ

- SQL - advanced

- Hive / Impala querying

- ETL pipeline design

- NoSQL - HBase / Cassandra / MongoDB

- Cloud : AWS or Azure Databricks

- Experience working in Agile