Posted on: 09/12/2025
Description :
Role : Pyspark Developer
Location : Chennai, Hyderabad, Kolkata
Experience : 5-15 years
Key Responsibilities :
- Transform and curate raw transactional and log data into analysis-ready datasets in the Data Hub and analytical data marts.
- Develop reusable and parameterized Spark jobs for batch and micro-batch processing.
- Optimize performance and scalability of PySpark jobs across large data volumes.
- Ensure data quality, consistency, lineage, and proper documentation across ingestion flows.
- Collaborate with Data Architects, Modelers, and Data Scientists to implement ingestion logic
aligned with business needs.
- Work with cloud-based data platforms (e.g., AWS S3, Glue, EMR, Redshift) for data movement and storage.
- Support version control, CI/CD, and infrastructure-as-code where applicable
Required Skills & Qualifications :
- 5+ years of experience in data engineering, with strong focus on PySpark/Spark for big data processing.
- Expertise in building data pipelines and ingestion frameworks from relational, semi-
structured (JSON, XML), and unstructured sources (logs, PDFs).
- Proficiency in Python with strong knowledge of data processing libraries.
- Strong SQL skills for querying and validating data in platforms like Amazon Redshift,
PostgreSQL, or similar.
- Experience with distributed computing frameworks (e.g., Spark on EMR, Databricks).
- Familiarity with workflow orchestration tools (e.g., AWS Step Functions, or similar).
- Solid understanding of data lake / data warehouse architectures and data modeling basics.
Preferred Qualifications :
- Familiarity with Delta Lake or similar for large-scale data storage.
- Exposure to real-time streaming frameworks (e.g., Spark Structured Streaming, Kafka).
- Knowledge of data governance, lineage, and cataloging tools (e.g., AWS Glue Catalog, Apache
Atlas).
- Understanding of DevOps/CI-CD pipelines for data projects using Git, Jenkins, or similar tools.
Did you find something suspicious?
Posted by
Posted in
Data Analytics & BI
Functional Area
Data Mining / Analysis
Job Code
1586651
Interview Questions for you
View All