HamburgerMenu
hirist

Job Description

Description :


Job Description : ML Ops Engineer


About the Job :


We are looking for an experienced ML Ops Engineer who can seamlessly bridge machine learning development with scalable production deployment.


The ideal candidate should bring strong expertise in CI/CD, containerization, cloud platforms, and ML model lifecycle management, with hands-on experience deploying models into production using modern MLOps frameworks and cloud-native tools such as AWS EKS, SageMaker, Lambda, and related services.


This role requires a deep understanding of ML engineering workflows, model versioning, data pipelines, and the distribution of responsibilities across ML Ops vs. core Machine Learning functions.


Key Responsibilities :


1. Model Lifecycle Management :


- Onboard new ML models into the existing ML platform with standardized training automation and deployment workflows.


- Design and manage model retraining, model updates, and model versioning strategies.


- Collaborate with Data Scientists to convert research prototypes into production-ready models.


- Ensure models comply with performance, governance, and monitoring standards.


2. CI/CD & Deployment Operations :


- Build, maintain, and optimize CI/CD pipelines for Search and ML services using modern DevOps tooling.


- Manage containerized model deployments using Docker, Kubernetes, AWS EKS, or equivalent platforms.


- Run deployment sign-off processes, ensuring daily release validations, version audits, and production readiness.


- Automate infrastructure provisioning and ML workflow orchestration using tools like Terraform, Airflow, or Kubeflow.


3. Big Data Pipeline Management :


- Maintain and enhance training data pipelines, ensuring efficiency, reliability, and cost optimization across big data ecosystems.


- Work with distributed processing tools such as Spark, Hive, EMR, or Glue for training dataset preparation.


- Monitor and optimize data flows to support continuous model improvement and retraining.


4. Cloud Technologies & ML Tooling :


- Utilize AWS cloud services such as SageMaker (training, inference, pipelines), EKS, Lambda, S3, CloudWatch, ECR, and Step Functions.


- Implement model deployment best practicesA/B testing, blue/green deployments, shadow deployments.


- Apply model monitoring for drift, performance degradation, and anomaly detection.


5. Cross-Functional Collaboration :


- Work closely with ML engineers, data scientists, DevOps engineers, and product teams.


- Provide guidance on the distribution of responsibilities between machine learning development and operationalization layers.


- Offer insights on scaling ML systems and optimizing cloud costs.


Required Skills & Experience :


- 3 to 8 years of experience in ML Ops, DevOps for ML, or Machine Learning engineering.


- Strong hands-on experience with CI/CD, GitOps workflows, and automated deployment pipelines.


- Expertise in containerization (Docker) and orchestration tools like Kubernetes, AWS EKS, or similar.


- Experience deploying ML models into production environments (real-time or batch).


- Proficiency in at least one programming language : Python, Go, or Java.


- Strong understanding of cloud platforms (preferably AWS) and related ML tools such as SageMaker.


- Experience with model tracking, versioning, and lifecycle tools (MLflow, SageMaker Model Registry, DVC).


- Familiarity with big data technologies : Spark, Hadoop, EMR, Glue, Hive, etc.


- Experience creating scalable and resilient ML pipelines.


Preferred Qualifications :


- Experience with feature stores, model observability tools, or real-time ML serving.


- Understanding of Responsible AI, model governance, and compliance frameworks.


- Experience with A/B testing or model experimentation platforms.


info-icon

Did you find something suspicious?