HamburgerMenu
hirist

Machine Learning Engineer - Data Modeling

Posted on: 29/01/2026

Job Description

Description :

Role Overview :


We are looking for a Machine Learning Engineer with strong MLOps, platform monitoring, and production ML deployment experience. This role focuses on ensuring high availability, performance, and reliability of ML platforms and models deployed using Domino Data Lab, while also building automated ML pipelines and operational tooling.

The ideal candidate will work closely with Data Scientists, ML Engineers, Platform Teams, and Vendors to ensure seamless model lifecycle management - from development to production and continuous monitoring.

Key Responsibilities :

1. Platform Monitoring & Reliability :


- Continuously monitor Domino Data Lab platform uptime, health, and availability

Track performance metrics for real-time and batch ML endpoints, including :

i. Availability

ii. Latency

iii. Throughput

- Maintain and enhance Grafana dashboards for :

i. Platform metrics


ii. Model deployment metrics


iii. Resource utilization (CPU, GPU, memory, network)

- Monitor computational performance, model drift, and model aging

Track changes in dependencies such as :

i. Data versions


ii. Feature sets

iii. Software and library upgrades


- Ensure proper incident logging, auditability, and observability

2. ML Platform Operations & Incident Management :


- Act as first responder for ML platform related incidents

- Log and manage incidents using ServiceNow

Perform :

i. Incident triage

ii. Root Cause Analysis (RCA)

iii. Resolution and preventive actions

- Document post-incident reports and drive long-term stability improvements

- Coordinate with Domino Data Lab support teams for platform-level issues

- Deploy, manage, and maintain ML models in production environments

- Handle Domino user onboarding and access management as per SOPs

3. MLOps & Engineering Development :


- Design, build, and maintain end-to-end automated ML pipelines

Implement CI/CD workflows for ML models :

i. Dev ? Staging ? Production

ii. Automated testing and validation

- Enable Continuous Training (CT) and experimentation frameworks

- Build shared MLOps tools, libraries, and utilities to accelerate model development

Implement automation for :

i. Model lineage

ii. Audit trails

iii. Approval and governance workflows

- Integrate model monitoring and alerting into deployment pipelines

Collaborate closely with :

i. Data Scientists

ii. ML Engineers

iii. Platform & Infra teams to ensure smooth handoff from experimentation to production

Required Skills & Experience :

- Strong experience as a Machine Learning Engineer / MLOps Engineer

- Hands-on experience with Domino Data Lab (mandatory or strong preference)

- Experience deploying and managing ML models in production

- Strong understanding of ML lifecycle management

Experience with :

i. CI/CD pipelines for ML

ii. Automated training and retraining workflows

- Proficiency in monitoring tools such as Grafana

- Experience with incident management tools (ServiceNow preferred)

- Strong understanding of compute resource optimization (CPU, GPU, memory)

Technical Skills :

- MLOps & ML Platforms : Domino Data Lab

- Monitoring & Observability : Grafana

- CI/CD : Jenkins / GitLab CI / similar

- Cloud & Infra : Containers, Kubernetes (preferred)

- Programming : Python (mandatory)

- Version Control : Git

- Incident Management : ServiceNow

Nice to Have :

- Experience with model drift detection and performance monitoring

- Exposure to governance, audit, and compliance frameworks

- Experience in regulated industries (Banking, Pharma, Healthcare)

- Knowledge of data versioning tools and feature stores

Soft Skills :

- Strong troubleshooting and problem-solving skills

- Ability to work in high-availability production environments

- Excellent documentation and communication skills

- Strong collaboration mindset across engineering and data teams


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in