Description :
We are seeking a highly skilled AWS Certified Data Engineer (Associate) with strong expertise in Python, PySpark, SQL, and AWS Cloud services. The ideal candidate will be responsible for building scalable data pipelines, designing robust data architectures, and enabling high-quality, model-ready datasets for analytics and machine learning use cases.
This role requires deep hands-on experience in ETL/ELT pipeline development, cloud-based data processing, database design, and infrastructure automation within the AWS ecosystem.
Key Responsibilities :
1. Data Ingestion & Processing :
- Ingest raw data from multiple sources including network logs, customer systems, APIs, and third-party platforms.
- Design, develop, and maintain scalable ETL/ELT pipelines using Python and PySpark.
- Transform and curate large datasets into clean, structured, and analytics-ready formats.
- Optimize data processing workflows for performance and cost efficiency on AWS.
2. Feature Engineering & Metadata Management :
- Build and automate feature pipelines in collaboration with Data Scientists.
- Develop reusable data transformation frameworks.
- Maintain feature stores and metadata repositories to ensure data consistency and explainability.
- Implement data validation and quality checks across pipelines.
3. Model Integration & Data Outputs :
- Design mechanisms to store, version, and serve model outputs.
- Ensure full data lineage from source ingestion to business reporting layers.
- Support integration of ML model predictions into downstream applications.
- Maintain data governance and documentation standards.
4. Infrastructure & Automation :
- Implement Infrastructure-as-Code (IaC) using tools like Terraform or CloudFormation.
- Deploy and manage data services on AWS Cloud.
- Automate job scheduling, monitoring, and alerting for ETL workflows.
- Ensure high availability, scalability, and security of data systems.
Required Skills & Technical Expertise :
Programming & Data Processing :
- Strong proficiency in Python
- Hands-on experience with PySpark for distributed data processing
- Advanced SQL query writing and performance optimization
- Experience in building scalable ETL/ELT pipelines
Cloud & AWS Services :
- AWS Certified Data Engineer Associate (Preferred)
- Strong knowledge of AWS services such as :
a. S3
b. EMR
c. Glue
d. Redshift
e. Lambda
f. RDS
g. Kinesis
- Experience managing cloud-native data architectures
Database & Architecture :
- Experience in relational and NoSQL database design
- Data modeling (Star schema, Snowflake schema)
- Query optimization and indexing strategies
Other Skills :
- Strong problem-solving and analytical skills
- Understanding of CI/CD pipelines
- Experience with version control (Git)
- Knowledge of data governance and lineage tools
Preferred Qualifications :
- Experience working with large-scale distributed data systems
- Exposure to feature store architecture
- Experience supporting Machine Learning pipelines
- Familiarity with containerization (Docker, Kubernetes) is a plus
Did you find something suspicious?
Posted by
Sweety Kumari
Senior Talent Acquisition Specialist at R Systems International Ltd.
Last Active: 28 Apr 2026
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1617449