HamburgerMenu
hirist

MLOps Engineer - CI/CD Pipeline

foundit
Multiple Locations
5 - 15 Years

Posted on: 09/12/2025

Job Description

Description :


We are seeking a seasoned MLOps Engineer to design, automate, and manage the complete machine learning lifecycle.


The ideal candidate will have deep expertise in ML engineering, containerization, CI/CD pipelines, model deployment, and cloud-based ML workflows.


This role requires strong DevOps knowledge, monitoring expertise, and experience in building scalable ML infrastructure.


Key Responsibilities :


- Design, develop, and maintain end-to-end MLOps pipelines for model training, validation, deployment, and monitoring.


- Implement ML-specific testing, validation, and automated model evaluation workflows.


- Containerize ML applications using Docker and manage orchestration with Kubernetes.


- Implement model and data versioning best practices using tools like DVC and MLflow.


- Work extensively with AWS ML services including SageMaker, ECR, EKS, EC2, and Lambda.


- Build and maintain CI/CD pipelines using GitLab, GitFlow, and Artifactory.


- Ensure robust monitoring and observability using Prometheus, Grafana, and ELK stack.


- Enable distributed training setups and optimize ML workflows for scalability and performance.


- Collaborate with data scientists and ML engineers to streamline production workflows.


- Ensure production ML systems are secure, reliable, and high-performing.


Required Skills & Qualifications :


- Python programming : 5+ years with strong understanding of ML frameworks.


- MLOps pipelines : Hands-on experience with testing, validation, and pipeline automation.


- Containerization & orchestration : Expertise in Docker, Kubernetes, and microservices.


- Versioning tools : Experience with DVC, MLflow, Git, or similar tools.


- Cloud ML services : 2- 3 years of experience with AWS ML stack (SageMaker, ECR, EKS, EC2, Lambda).


- DevOps practices : Proficiency with GitLab, GitFlow, Artifactory, and CI/CD pipelines.


- Monitoring & observability : Hands-on with Prometheus, Grafana, ELK stack.


- Distributed training : Knowledge of multi-GPU/multi-node setups using Horovod, Ray, or equivalent.


- Strong problem-solving skills and experience building scalable, production-grade ML systems


info-icon

Did you find something suspicious?