Posted on: 09/12/2025
Description :
We are seeking a seasoned MLOps Engineer to design, automate, and manage the complete machine learning lifecycle.
The ideal candidate will have deep expertise in ML engineering, containerization, CI/CD pipelines, model deployment, and cloud-based ML workflows.
This role requires strong DevOps knowledge, monitoring expertise, and experience in building scalable ML infrastructure.
Key Responsibilities :
- Design, develop, and maintain end-to-end MLOps pipelines for model training, validation, deployment, and monitoring.
- Implement ML-specific testing, validation, and automated model evaluation workflows.
- Containerize ML applications using Docker and manage orchestration with Kubernetes.
- Implement model and data versioning best practices using tools like DVC and MLflow.
- Work extensively with AWS ML services including SageMaker, ECR, EKS, EC2, and Lambda.
- Build and maintain CI/CD pipelines using GitLab, GitFlow, and Artifactory.
- Ensure robust monitoring and observability using Prometheus, Grafana, and ELK stack.
- Enable distributed training setups and optimize ML workflows for scalability and performance.
- Collaborate with data scientists and ML engineers to streamline production workflows.
- Ensure production ML systems are secure, reliable, and high-performing.
Required Skills & Qualifications :
- Python programming : 5+ years with strong understanding of ML frameworks.
- MLOps pipelines : Hands-on experience with testing, validation, and pipeline automation.
- Containerization & orchestration : Expertise in Docker, Kubernetes, and microservices.
- Versioning tools : Experience with DVC, MLflow, Git, or similar tools.
- Cloud ML services : 2- 3 years of experience with AWS ML stack (SageMaker, ECR, EKS, EC2, Lambda).
- DevOps practices : Proficiency with GitLab, GitFlow, Artifactory, and CI/CD pipelines.
- Monitoring & observability : Hands-on with Prometheus, Grafana, ELK stack.
- Distributed training : Knowledge of multi-GPU/multi-node setups using Horovod, Ray, or equivalent.
- Strong problem-solving skills and experience building scalable, production-grade ML systems
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
ML / DL Engineering
Job Code
1587234
Interview Questions for you
View All