Posted on: 30/11/2025
Description :
Job Description : ML Ops Engineer
About the Job :
We are looking for an experienced ML Ops Engineer who can seamlessly bridge machine learning development with scalable production deployment.
The ideal candidate should bring strong expertise in CI/CD, containerization, cloud platforms, and ML model lifecycle management, with hands-on experience deploying models into production using modern MLOps frameworks and cloud-native tools such as AWS EKS, SageMaker, Lambda, and related services.
This role requires a deep understanding of ML engineering workflows, model versioning, data pipelines, and the distribution of responsibilities across ML Ops vs. core Machine Learning functions.
Key Responsibilities :
1. Model Lifecycle Management :
- Onboard new ML models into the existing ML platform with standardized training automation and deployment workflows.
- Design and manage model retraining, model updates, and model versioning strategies.
- Collaborate with Data Scientists to convert research prototypes into production-ready models.
- Ensure models comply with performance, governance, and monitoring standards.
2. CI/CD & Deployment Operations :
- Build, maintain, and optimize CI/CD pipelines for Search and ML services using modern DevOps tooling.
- Manage containerized model deployments using Docker, Kubernetes, AWS EKS, or equivalent platforms.
- Run deployment sign-off processes, ensuring daily release validations, version audits, and production readiness.
- Automate infrastructure provisioning and ML workflow orchestration using tools like Terraform, Airflow, or Kubeflow.
3. Big Data Pipeline Management :
- Maintain and enhance training data pipelines, ensuring efficiency, reliability, and cost optimization across big data ecosystems.
- Work with distributed processing tools such as Spark, Hive, EMR, or Glue for training dataset preparation.
- Monitor and optimize data flows to support continuous model improvement and retraining.
4. Cloud Technologies & ML Tooling :
- Utilize AWS cloud services such as SageMaker (training, inference, pipelines), EKS, Lambda, S3, CloudWatch, ECR, and Step Functions.
- Implement model deployment best practicesA/B testing, blue/green deployments, shadow deployments.
- Apply model monitoring for drift, performance degradation, and anomaly detection.
5. Cross-Functional Collaboration :
- Work closely with ML engineers, data scientists, DevOps engineers, and product teams.
- Provide guidance on the distribution of responsibilities between machine learning development and operationalization layers.
- Offer insights on scaling ML systems and optimizing cloud costs.
Required Skills & Experience :
- 3 to 8 years of experience in ML Ops, DevOps for ML, or Machine Learning engineering.
- Strong hands-on experience with CI/CD, GitOps workflows, and automated deployment pipelines.
- Expertise in containerization (Docker) and orchestration tools like Kubernetes, AWS EKS, or similar.
- Experience deploying ML models into production environments (real-time or batch).
- Proficiency in at least one programming language : Python, Go, or Java.
- Strong understanding of cloud platforms (preferably AWS) and related ML tools such as SageMaker.
- Experience with model tracking, versioning, and lifecycle tools (MLflow, SageMaker Model Registry, DVC).
- Familiarity with big data technologies : Spark, Hadoop, EMR, Glue, Hive, etc.
- Experience creating scalable and resilient ML pipelines.
Preferred Qualifications :
- Experience with feature stores, model observability tools, or real-time ML serving.
- Understanding of Responsible AI, model governance, and compliance frameworks.
- Experience with A/B testing or model experimentation platforms.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1582669
Interview Questions for you
View All