Posted on: 03/12/2025
Description :
Job Summary :
The ideal candidate will have experience developing and orchestrating cloud-native ETL pipelines using Python and AWS services (Redshift, EC2, S3, Glue, SSM), including robust logging, error handling, and automation of data migration tasks.
They will also possess a strong background and deep understanding of AWS data architecture systems.
This role involves designing, implementing, and testing robust data pipelines, optimizing Redshift database performance, and ensuring the scalability and reliability of our data infrastructure.
Key Responsibilities Include, But Are Not Limited To :
- Design, develop, and orchestrate cloud-native ETL pipelines using Python and AWS services (Redshift, EC2, S3, Glue, SSM).
- Migrate data from legacy systems (e.g., DB2) to Amazon Redshift, ensuring data quality, integrity, and completeness.
- Optimize Redshift database performance through effective use of sort keys, distribution keys, and query tuning.
- Implement robust logging, monitoring, and error handling in accordance with architecture standards.
- Automate data migration, transformation, and validation processes to support scalable and reliable data workflows.
- Collaborate with cross-functional teams to gather requirements and deliver data solutions that meet business needs.
- Maintain and enhance data infrastructure to ensure high availability, scalability, and security.
- Troubleshoot and resolve issues related to data pipelines, Redshift performance, and AWS infrastructure.
- Document data flows, pipeline designs, and operational procedures for ongoing support and knowledge sharing.
- Stay current with AWS best practices and emerging technologies to continuously improve data engineering processes.
Qualifications :
- Bachelors degree in Computer Science, Information Systems, Engineering, or related field.
- 3+ years of hands-on experience with Amazon Redshift, including table design, data loading, and query optimization.
- Strong experience with AWS ecosystem : S3, EC2, IAM, Glue, SSM, and related services.
- Strong proficiency using Git for version control and collaboration on code base.
- Proficient in Python for ETL, automation, and orchestration (including boto3, PySpark).
- Solid background in SQL, with expertise in writing, analyzing, and tuning complex queries for large datasets.
- Experience migrating data from legacy systems (e.g., DB2) to Redshift or other cloud data warehouses.
Desired Skills :
- Deep understanding of Redshift architecture (columnar storage, MPP, node types, concurrency scaling).
- Expertise in designing tables with optimal sort keys, distribution keys, and column encoding.
- Ability to use EXPLAIN plans to diagnose and resolve query performance issues.
- Experience with Redshift-specific maintenance (VACUUM, ANALYZE, WLM configuration).
- Experience automating performance monitoring, alerting, and cost optimization in Redshift.
- Strong troubleshooting skills for slow queries, data skew, and resource contention.
- Experience with CI/CD for data pipelines and infrastructure-as-code (Terraform) is a plus.
- Knowledge of serverless architecture and services (e.g AWS Lambda).
- Excellent communication skills for collaborating with cross-functional teams and documenting solutions.
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1584354
Interview Questions for you
View All