HamburgerMenu
hirist

MLOps Engineer

Procallisto solution
Multiple Locations
5 - 8 Years

Posted on: 06/10/2025

Job Description

1. Monitoring :

- Continuously track Domino platform uptime, resource utilization, and health

- Monitor availability, latency, and throughput of real-time and batch endpoints.

- Maintain dashboards (Grafana) for platform and deployment metrics. Ensure proper incident logging for audit and troubleshooting

- Monitor resource utilization (CPU, GPU, memory, network traffic), computational performance, and model aging.

- Keep tabs on changes to dependencies, such as data version or software upgrades.

2. Operations :

- Use ServiceNow for Incident logging. Act as first responder for platform-related incidents. Triage, root cause analysis, and resolution for outages or performance issues. Document RCA and drive preventive measures.

- Coordinate with Domino Data Lab for platform support.

- Deploy and Maintain ML models in production environments.

- Domino User onboarding as per SOPs

3. MLOps Related Development :

- Design, build and maintain automated ML. pipelines to incorporate CI/CD workflows and rapid deployment of models (Dev Staging - Prod) as well as continuous training (CT) and experimentation.

- Build and maintain shared tools/utilities to accelerate model development.

- Build automation for audit trails, model lineage, and approval workflows. Integrate model monitoring into deployment workflows.

- Collaborate with data scientists and engineers to ensure smooth handoff from model development to production and basic knowledge of ServiceNow for ticket management

The job is for:

Women candidates preferred
May work from home
For women joining back the workforce
info-icon

Did you find something suspicious?