Posted on: 03/12/2025
Description :
- Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.
- Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).
- Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.
- Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.
- Implement ML observability : model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.
- Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.
- Collaborate with data scientists to productionize notebooks, experiments, and model deployments.
Ideal Candidate :
- 8+ years in MLOps/DevOps with strong ML pipeline experience.
Strong hands-on experience with AWS :
1. Compute/Orchestration : EKS, ECS, EC2, Lambda
2. Data : EMR, Glue, S3, Redshift, RDS, Athena, Kinesis
3. Workflow : MWAA/Airflow, Step Functions
4. Monitoring : CloudWatch, OpenSearch, Grafana
- Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).
- Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).
- Strong Linux, scripting, and troubleshooting skills.
- Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.
Education :
- Masters degree in Computer Science, Machine Learning, Data Engineering, or related field.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
ML / DL Engineering
Job Code
1584564
Interview Questions for you
View All