HamburgerMenu
hirist

Job Description

Description :

About the Role :

We are hiring a Kubernetes AI/ML Ops Observability Engineer to build, monitor, and optimize AI infrastructure. This role combines expertise in Kubernetes, observability stacks, and AI/ML orchestration tools such as LangFuse, LangServe, and LangGraph.

Key Responsibilities :

- Manage Kubernetes-based AI/ML infrastructure, ensuring reliability and scalability.

- Implement observability solutions using Prometheus, Grafana, ELK Stack, and tracing tools.

- Monitor system health and automate alerts using Datadog and PagerDuty.

- Support deployment and monitoring of AI/ML pipelines integrated with LangServe, LangGraph, and LangFuse.

- Develop automation scripts using Python and manage infrastructure via Terraform.

- Collaborate with DevOps, ML engineers, and data teams to maintain system uptime and performance.

Required Skills :

- Strong expertise in Kubernetes, Linux, and Python scripting.

- Experience with observability tools (Prometheus, Grafana, ELK, Datadog, PagerDuty).

- Familiarity with LangServe, LangGraph, LangFuse, and modern MLOps ecosystems.

- Hands-on experience in DevOps, tracing, and performance monitoring of distributed systems.


info-icon

Did you find something suspicious?