Posted on: 23/06/2025
About the Role :
We are seeking a highly skilled ETL Engineer with deep expertise in AWS cloud services and PySpark scripting to design, develop, and maintain scalable data pipelines. The ideal candidate will play a key role in implementing cloud-based data solutions, ensuring efficient processing of large volumes of structured and semi-structured data, and supporting data lake and data governance initiatives.
Key Responsibilities :
- Develop, deploy, and maintain robust ETL pipelines leveraging PySpark on AWS platforms.
- Work independently as an individual contributor, managing end-to-end development of data workflows.
- Design and implement scalable data ingestion, transformation, and processing solutions using AWS services like S3, Lambda, SNS, Cloud Step Functions.
- Optimize PySpark scripts and AWS resources for maximum performance and cost-efficiency.
- Collaborate closely with cross-functional teams including data architects, business analysts, and IT stakeholders to understand data requirements and deliver data integration solutions.
- Build and maintain data lakes and configure Delta tables to support efficient data storage and access.
- Implement best practices for metadata management, data lineage, and data governance to ensure data quality and compliance.
- Monitor and troubleshoot data pipelines, proactively identifying bottlenecks and failures.
- Utilize Python libraries such as NumPy and Pandas for data processing and analysis tasks as part of pipeline development.
- Work with orchestration tools like Managed Workflows for Apache Airflow (MWAA) to schedule and automate workflows.
- Participate in cost and resource optimization initiatives to reduce cloud infrastructure expenses while maintaining performance.
- Document technical design, data flow, and processes for future reference and knowledge sharing.
Mandatory Skills :
- 8+ years in ETL development with a strong focus on AWS cloud ecosystem.
- Minimum 4+ years hands-on experience with PySpark scripting for large-scale data transformations.
- Strong expertise in AWS services relevant to data processing: S3, Lambda, SNS, Cloud Step Functions.
- Proficiency in Python programming, including libraries such as NumPy, Pandas, and scripting for automation and data manipulation.
- Solid understanding of data modeling, pipeline design, and data integration frameworks.
- Experience working independently and driving deliverables as an individual contributor.
Did you find something suspicious?
Posted By
Siddhi Karle
Last Login: NA as recruiter has posted this job through third party tool.
Posted in
Data Analytics & BI
Functional Area
Data Mining / Analysis
Job Code
1500640
Interview Questions for you
View All