AI/ML

Artificial Intelligence

Machine Learning

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Pyspark Developer

SUNWARE TECHNOLOGIES PRIVATE LIMITED

Multiple Locations

5 - 15 Years

Python PySpark Data Pipeline Data Analytics Data Modeling Spark

Posted on: 09/12/2025

Job Description

Description :

Role : Pyspark Developer

Location : Chennai, Hyderabad, Kolkata

Experience : 5-15 years

Key Responsibilities :

- Design and build robust, scalable ETL/ELT pipelines using PySpark to ingest data from diverse sources (databases, logs, APIs, files).

- Transform and curate raw transactional and log data into analysis-ready datasets in the Data Hub and analytical data marts.

- Develop reusable and parameterized Spark jobs for batch and micro-batch processing.

- Optimize performance and scalability of PySpark jobs across large data volumes.

- Ensure data quality, consistency, lineage, and proper documentation across ingestion flows.

- Collaborate with Data Architects, Modelers, and Data Scientists to implement ingestion logic

aligned with business needs.

- Work with cloud-based data platforms (e.g., AWS S3, Glue, EMR, Redshift) for data movement and storage.

- Support version control, CI/CD, and infrastructure-as-code where applicable

Required Skills & Qualifications :

- 5+ years of experience in data engineering, with strong focus on PySpark/Spark for big data processing.

- Expertise in building data pipelines and ingestion frameworks from relational, semi-

structured (JSON, XML), and unstructured sources (logs, PDFs).

- Proficiency in Python with strong knowledge of data processing libraries.

- Strong SQL skills for querying and validating data in platforms like Amazon Redshift,

PostgreSQL, or similar.

- Experience with distributed computing frameworks (e.g., Spark on EMR, Databricks).

- Familiarity with workflow orchestration tools (e.g., AWS Step Functions, or similar).

- Solid understanding of data lake / data warehouse architectures and data modeling basics.

Preferred Qualifications :

- Experience with AWS data services : Glue, S3, Redshift, Lambda, CloudWatch, etc.

- Familiarity with Delta Lake or similar for large-scale data storage.

- Exposure to real-time streaming frameworks (e.g., Spark Structured Streaming, Kafka).

- Knowledge of data governance, lineage, and cataloging tools (e.g., AWS Glue Catalog, Apache

Atlas).

- Understanding of DevOps/CI-CD pipelines for data projects using Git, Jenkins, or similar tools.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Velkiruba

HR at SUNWARE TECHNOLOGIES PRIVATE LIMITED

Last Active: 29 Jan 2026

Job Views:
137

Applications: 25

Recruiter Actions: 18

Posted in

Data Analytics & BI

Functional Area

Data Mining / Analysis

Job Code

1586651

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers