HamburgerMenu
hirist

Job Description

Position : Lead MLOps Engineer

Experience : 9+ Years


Location : Mumbai, Mahape


About the Role :


We are looking for a Lead MLOps Engineer who will lead and support the end-to-end deployment, monitoring, and optimization of machine learning and data-driven applications across cloud platforms. You will collaborate with data scientists, engineers, and business stakeholders to ensure scalable, secure, and highly available ML systems.


Key Responsibilities :


- Be a hands-on contributor capable of independently designing and developing complete MLOps solutions from scratch.

- Lead end-to-end ML pipeline development, deployment, and monitoring across GCP and Azure.

- Build and maintain CI/CD pipelines using tools like ArgoCD, Git, and Docker.

- Automate and optimize ML model training, validation, deployment, and scaling using Kubernetes, Kubeflow, or similar orchestration platforms.

- Develop data processing workflows using Python and PySpark on distributed systems.

- Implement observability using tools like Grafana, NewRelic, and cloud-native monitoring solutions.

- Collaborate with Data Scientists to transition research into production-grade solutions.

- Guide and mentor junior engineers, enforce coding standards, and conduct code reviews.

- Demonstrate business understanding to align ML pipelines with product goals.

- Manage infrastructure as code (IaC) for reproducibility and scalability.

- Exposure to AI and RAG-related development, various GPU and AI Platforms required.


Required Skills & Experience :


- 9+ years of hands-on experience in MLOps roles.

- Strong proficiency in Python and PySpark with clean and scalable code practices.

- Expertise in GCP and Azure cloud platforms including compute, storage, and networking components.

- Proven experience in deploying and managing containerized applications using Docker and Kubernetes.

- Hands-on with CI/CD tools preferably ArgoCD, GitHub Actions, or GitLab CI.

- Experience in monitoring, logging, and alerting using tools such as Grafana, NewRelic, Prometheus, or similar.

- Understanding of ML model lifecycle, versioning, and performance monitoring.

- Experience with MLFlow for model versioning.

- Ability to create REST APIs using FastAPI, Flask, or Django.


- Strong problem-solving, communication, and stakeholder management skills.

- Experience mentoring teams and driving end-to-end project execution.

- Exposure to Vertex AI Pipeline in GCP or similar in other clouds is a plus.


info-icon

Did you find something suspicious?