We are seeking a highly skilled AWS Certified Data Engineer (Associate) with strong expertise in Python, PySpark, SQL, and AWS Cloud services. The ideal candidate will be responsible for building scalable data pipelines, designing robust data architectures, and enabling high-quality, model-ready datasets for analytics and machine learning use cases.

This role requires deep hands-on experience in ETL/ELT pipeline development, cloud-based data processing, database design, and infrastructure automation within the AWS ecosystem.

Key Responsibilities :

1. Data Ingestion & Processing :

- Ingest raw data from multiple sources including network logs, customer systems, APIs, and third-party platforms.

- Design, develop, and maintain scalable ETL/ELT pipelines using Python and PySpark.

- Transform and curate large datasets into clean, structured, and analytics-ready formats.

- Optimize data processing workflows for performance and cost efficiency on AWS.

2. Feature Engineering & Metadata Management :

- Build and automate feature pipelines in collaboration with Data Scientists.

- Develop reusable data transformation frameworks.

- Maintain feature stores and metadata repositories to ensure data consistency and explainability.

- Implement data validation and quality checks across pipelines.

3. Model Integration & Data Outputs :

- Design mechanisms to store, version, and serve model outputs.

- Ensure full data lineage from source ingestion to business reporting layers.

- Support integration of ML model predictions into downstream applications.

- Maintain data governance and documentation standards.

4. Infrastructure & Automation :

- Implement Infrastructure-as-Code (IaC) using tools like Terraform or CloudFormation.

- Deploy and manage data services on AWS Cloud.

- Automate job scheduling, monitoring, and alerting for ETL workflows.

- Ensure high availability, scalability, and security of data systems.

Required Skills & Technical Expertise :

Programming & Data Processing :

- Strong proficiency in Python

- Hands-on experience with PySpark for distributed data processing

- Advanced SQL query writing and performance optimization

- Experience in building scalable ETL/ELT pipelines

Cloud & AWS Services :

- AWS Certified Data Engineer Associate (Preferred)

- Strong knowledge of AWS services such as :

a. S3

b. EMR

c. Glue

d. Redshift

e. Lambda

f. RDS

g. Kinesis

- Experience managing cloud-native data architectures

Database & Architecture :

- Experience in relational and NoSQL database design

- Data modeling (Star schema, Snowflake schema)

- Query optimization and indexing strategies

Other Skills :

- Strong problem-solving and analytical skills

- Understanding of CI/CD pipelines

- Experience with version control (Git)

- Knowledge of data governance and lineage tools

Preferred Qualifications :

- Experience working with large-scale distributed data systems

- Exposure to feature store architecture

- Experience supporting Machine Learning pipelines

- Familiarity with containerization (Docker, Kubernetes) is a plus

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Sweety Kumari

Senior Talent Acquisition Specialist at R Systems International Ltd.

Last Active: 28 Apr 2026

Job Views:
526

Applications: 380

Recruiter Actions: 7

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1617449

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers