Posted on: 24/02/2026
Job Description :
- As an MLOps Engineer II, you will play a key role in designing, building, and operating scalable and reliable machine learning platform and production inferencing.
- You will work closely with Data Scientists and Platform teams to operationalize end-to-end ML workflows on AWS, ensuring models move seamlessly from experimentation to production and monitoring.
- In this role, you are expected to operate with a high degree of ownership, contribute to architectural decisions, and mentor junior engineers and interns.
- You will also contribute to advanced initiatives such as Agentic AI systems and MCP servers, helping the team adopt emerging AI infrastructure patterns while maintaining strong MLOps fundamentals.
- Design, build, deploy, and maintain production-grade ML pipelines and workflows using AWS and Python, with a focus on reliability, scalability, and observability.
- Own and enhance the MLOps platform that automates the full ML model lifecycle from data annotation and training to inference, monitoring, and feedback loops.
- Collaborate closely with Data Scientists to productionize models, including packaging, versioning, deployment strategies, and performance optimization.
- Contribute to Agentic AI initiatives, including evaluation and deployment of MCP servers and related infrastructure components.
- Implement monitoring, logging, alerting, and CI/CD best practices for ML systems to ensure production stability and rapid issue resolution.
- Troubleshoot complex pipeline, infrastructure, and inference issues, performing root cause analysis and driving long-term fixes.
- Stay current with evolving MLOps practices, cloud-native ML tooling, and emerging AI infrastructure trends, and proactively introduce improvements.
- Participate in design reviews, technical discussions, and planning meetings; clearly communicate progress, risks, and trade-offs to stakeholders.
- Mentor interns and junior engineers by providing technical guidance, code reviews, and best practices.
- 3-6 years of hands-on experience building and operating ML or data platforms, with a strong focus on MLOps or ML infrastructure.
- Strong practical experience with AWS services such as Sagemaker, S3, EC2, Batch, Lambda, IAM, and monitoring tools.
- Proficiency in Python for building ML pipelines, automation, and infrastructure tooling.
- Solid understanding of the ML lifecycle, including training, evaluation, deployment, inference, and model monitoring.
- Experience with containerization (Docker) and familiarity with orchestration frameworks (e.g., Kubernetes or managed equivalents).
- Strong problem-solving skills and the ability to independently drive tasks in a fast-paced, evolving environment.
- Effective communication skills and experience collaborating across Data Science and Engineering teams.
Preferred Experience :
- Experience designing or operating end-to-end MLOps platforms supporting multiple models, teams, or use cases.
- Familiarity with CI/CD systems and Git-based workflows.
- Hands-on experience with ML inference systems (real-time or batch), including performance tuning and cost optimization.
- Exposure to or active work in Agentic AI, GenAI infrastructure, or MCP servers.
- Demonstrated ability to mentor junior engineers and raise overall team engineering quality.
- Strong aptitude for evaluating and adopting new technologies as AI and MLOps ecosystems evolve.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
ML / DL / AI Research
Job Code
1615580