HamburgerMenu
hirist

PySpark Developer

Recruitment Hub 365
Multiple Locations
5 - 10 Years

Posted on: 17/11/2025

Job Description

Job Title : PySpark Developer

Location : Chennai, Hyderabad, Kolkata

Work Mode : Monday - Friday (5 days WFO)

Experience : 5+ Years in Backend Development

Notice Period : Immediate to 15 days

Must-Have Experience : Python, PySpark, Amazon Redshift, PostgreSQL

About the Role :



We are looking for an experienced PySpark Developer with strong data engineering capabilities to design, develop, and optimize scalable data pipelines for large-scale data processing. The ideal candidate must possess in-depth knowledge of PySpark, SQL, and cloud-based data ecosystems, along with strong problem-solving skills and the ability to work with cross-functional teams.

Roles & Responsibilities :



- Design and develop robust, scalable ETL/ELT pipelines using PySpark to process data from various sources such as databases, APIs, logs, and files.


- Transform raw data into analysis-ready datasets for data hubs and analytical data marts.


- Build reusable, parameterized Spark jobs for batch and micro-batch processing.


- Optimize PySpark job performance to handle large and complex datasets efficiently.


- Ensure data quality, consistency, and lineage, and maintain thorough documentation across

all ingestion workflows.


- Collaborate with Data Architects, Data Modelers, and Data Scientists to implement ingestion

logic aligned with business requirements.


- Work with AWS-based data platforms (S3, Glue, EMR, Redshift) for data movement and

storage.


- Support version control, CI/CD processes, and infrastructure-as-code practices as required.

Must-Have Skills :


- Minimum 5+ years of data engineering experience, with a strong focus on PySpark/Spark.



- Proven experience building data pipelines and ingestion frameworks for relational, semi-

structured (JSON, XML), and unstructured data (logs, PDFs).


- Strong knowledge of Python and related data processing libraries.


- Advanced SQL proficiency (Amazon Redshift, PostgreSQL or similar).


- Hands-on expertise with distributed computing frameworks such as Spark on EMR or

Databricks.


- Familiarity with workflow orchestration tools like AWS Step Functions or similar.


- Good understanding of data lake and data warehouse architectures, including fundamental

data modeling concepts.

Good-to-Have Skills :


- Experience with AWS data services : Glue, S3, Redshift, Lambda, CloudWatch.


- Exposure to Delta Lake or similar large-scale storage technologies.


- Experience with real-time streaming tools such as Spark Structured Streaming or Kafka.


- Understanding of data governance, lineage, and cataloging tools (AWS Glue Catalog, Apache

Atlas).


- Knowledge of DevOps/CI-CD pipelines using Git, Jenkins.


info-icon

Did you find something suspicious?