Job Description

About the Role :


We are seeking a highly skilled ETL Engineer with deep expertise in AWS cloud services and PySpark scripting to design, develop, and maintain scalable data pipelines. The ideal candidate will play a key role in implementing cloud-based data solutions, ensuring efficient processing of large volumes of structured and semi-structured data, and supporting data lake and data governance initiatives.

Key Responsibilities :


- Develop, deploy, and maintain robust ETL pipelines leveraging PySpark on AWS platforms.

- Work independently as an individual contributor, managing end-to-end development of data workflows.

- Design and implement scalable data ingestion, transformation, and processing solutions using AWS services like S3, Lambda, SNS, Cloud Step Functions.

- Optimize PySpark scripts and AWS resources for maximum performance and cost-efficiency.

- Collaborate closely with cross-functional teams including data architects, business analysts, and IT stakeholders to understand data requirements and deliver data integration solutions.

- Build and maintain data lakes and configure Delta tables to support efficient data storage and access.

- Implement best practices for metadata management, data lineage, and data governance to ensure data quality and compliance.

- Monitor and troubleshoot data pipelines, proactively identifying bottlenecks and failures.

- Utilize Python libraries such as NumPy and Pandas for data processing and analysis tasks as part of pipeline development.

- Work with orchestration tools like Managed Workflows for Apache Airflow (MWAA) to schedule and automate workflows.

- Participate in cost and resource optimization initiatives to reduce cloud infrastructure expenses while maintaining performance.

- Document technical design, data flow, and processes for future reference and knowledge sharing.

Mandatory Skills :


- 8+ years in ETL development with a strong focus on AWS cloud ecosystem.

- Minimum 4+ years hands-on experience with PySpark scripting for large-scale data transformations.

- Strong expertise in AWS services relevant to data processing: S3, Lambda, SNS, Cloud Step Functions.

- Proficiency in Python programming, including libraries such as NumPy, Pandas, and scripting for automation and data manipulation.

- Solid understanding of data modeling, pipeline design, and data integration frameworks.

- Experience working independently and driving deliverables as an individual contributor.


info-icon

Did you find something suspicious?