Posted on: 05/08/2025
We're Hiring : MLOps Engineer
Location : Pune
Experience : 7+ Years
Notice Period : Immediate to Short Joiners Preferred
Role Type : Full-Time | On-site / Hybrid (based on company policy)
About the Role :
We are seeking a highly skilled MLOps Engineer to support our rapidly growing AI/ML initiatives, including Generative AI platforms, agentic AI systems, and large-scale model deployments. This role blends advanced DevOps practices with modern AI infrastructure, requiring deep experience in managing cloud-native environments, ML model pipelines, and distributed compute systems.
Youll work alongside AI researchers, data scientists, and software engineers to streamline model development and deployment across cloud platforms with a strong emphasis on reliability, scalability, and automation.
Key Responsibilities :
- Design, build, and maintain CI/CD pipelines for machine learning model training and deployment workflows (including RAG and LLM systems).
- Manage and optimize Kubernetes clusters with GPU-based compute for distributed ML workloads.
- Automate infrastructure provisioning and management using Terraform on GCP (preferred), AWS, or Azure.
- Deploy and manage components such as vector databases, feature stores, and observability tools (e.g., Prometheus, Grafana, ELK).
- Ensure security, resilience, and high availability of all AI workloads and underlying infrastructure.
- Collaborate with AI/ML engineers, data scientists, and platform teams to integrate and deploy end-to-end AI solutions.
- Enable agentic AI workflows with tools like LangChain, LangGraph, CrewAI, and others.
Required Skills & Experience :
- 7+ years in DevOps, MLOps, or Infrastructure Engineering roles.
- 4+ years of experience in cloud-native development (with at least 2+ years in AI/ML-related environments).
- Proficient in Python and scripting languages like Bash.
- Hands-on experience with CI/CD tools such as Jenkins, Harness, GitHub Actions, or ArgoCD.
- Deep understanding of Kubernetes, especially GPU orchestration and autoscaling.
- Experience managing infrastructure on GCP (preferred), AWS, or Azure.
- Strong skills in Terraform for Infrastructure as Code (IaC).
- Sound knowledge of monitoring, logging, and security best practices in ML production environments.
Nice-to-Have (Bonus) Skills :
- Familiarity with MLOps platforms like MLflow, Kubeflow, SageMaker, or Vertex AI.
- Experience with RAG (Retrieval-Augmented Generation) pipelines and prompt engineering.
- Exposure to model fine-tuning, drift detection, and rollback strategies.
- Working knowledge of containerization tools like Docker and container security scanning.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
ML / DL Engineering
Job Code
1524958
Interview Questions for you
View All