Data Software Engineer - Spark/Python/Azure Databricks

Talpro India Private Limited

Bangalore

6 - 8 Years

4.4

7+ Reviews

Data Engineering Spark Python AWS Azure Azure Databricks Big Data Distributed Systems Hadoop ETL Data Pipeline

Posted on: 25/12/2025

Job Description

Description :

Role Overview :

We are hiring Data Software Engineers with strong expertise in Apache Spark, Python, and AWS/Azure Databricks.

The ideal candidates will have deep Big Data engineering experience, strong distributed systems knowledge, and the ability to work on complex end-to-end data platforms at scale.

Key Responsibilities :

Big Data Engineering :

- Design and build distributed data processing systems using Spark and Hadoop.

- Develop and optimize Spark applications, ensuring performance and scalability.

- Create and manage ETL/ELT pipelines for large-scale data ingestion and transformation.

Streaming & Event Processing :

- Build and manage real-time streaming systems using Spark Streaming or Storm.

- Work with Kafka / RabbitMQ for event-driven ingestion and messaging patterns.

Cloud & Databricks Engineering :

- Develop & optimize workloads on AWS Databricks or Azure Databricks.

- Perform cluster management, job scheduling, performance tuning, and automation.

Data Integration & Storage :

- Integrate data from diverse sources : RDBMS (Oracle, SQL Server), ERP, file systems.

- Work with query engines like Hive and Impala.

- Experience with NoSQL stores : HBase, Cassandra, MongoDB.

Programming & Scripting :

- Strong hands-on coding in Python for data transformations and automations.

- Strong SQL skills for data validation, tuning, and complex queries

Team Leadership (L4) :

- Provide technical leadership and mentoring to junior engineers.

- Drive solution design for Big Data platforms end-to-end

Ways of Working :

- Work in Agile teams, participate in sprint ceremonies and planning.

- Collaborate with engineering, data science, and product teams.

Required Skills & Expertise (Both L2 & L4) :

- Apache Spark - Expert level (core, SQL, streaming)

- Python - Strong hands-on

- Distributed computing fundamentals

- Hadoop ecosystem : Hadoop v2, MapReduce, HDFS, Sqoop

- Streaming systems : Spark Streaming / Storm

- Messaging : Kafka or RabbitMQ

- SQL - Advanced (joins, stored procedures, query optimization)

- NoSQL : HBase, Cassandra, MongoDB

- ETL frameworks & data pipeline design

- Hive / Impala querying

- Performance tuning of Spark jobs

- AWS or Azure Databricks

- Experience working in Agile