Data Engineer - PySpark/Scala

BLUEBYTE IT SOLUTIONS INDIA PVT LTD

Bangalore

6 - 8 Years

2+ Reviews

Data Engineering PySpark Data Pipeline Scala ETL Hadoop DataLake Azure Cloud Services Python

Posted on: 06/11/2025

Job Description

Description :

Key Responsibilities :

- Design, develop, and optimize large-scale data pipelines using PySpark and Scala.

- Implement ETL processes on big data platforms such as Hadoop and Azure Data Lake.

- Work with Azure services like Azure Databricks, Azure Data Factory, Azure Synapse Analytics, and Azure Blob Storage.

- Develop, test, and maintain data ingestion and transformation frameworks using Python and Spark.

- Collaborate with cross-functional teams to integrate data from multiple sources and ensure high data quality.

- Implement data governance, security, and performance tuning best practices.

- Troubleshoot and optimize data workflows for scalability and efficiency.

Mandatory Skills :

- PySpark and Scala programming

- Hadoop ecosystem (HDFS, Hive, HBase, etc.)

- Python for data processing and automation

- Azure Cloud (Databricks, ADF, Synapse, ADLS)

- Strong understanding of ETL, data modeling, and data warehousing concepts

Good to Have :

- Experience with Kafka or Event Hub for streaming data

- Knowledge of SQL and NoSQL databases

- Familiarity with CI/CD pipelines and DevOps tools

- Exposure to Delta Lake and Lakehouse architecture