HamburgerMenu
hirist

Monitoring Engineer - Site Reliability

Posted on: 19/09/2025

Job Description

Job Title: LLM System Monitor - Site Reliability Engineer (SRE).

Location: Bangalore, India (Hybrid - Onsite 3 Days/Week).

Type: Full-Time (Insight Global at Cisco).

Required Skills & Experience :

- 3+ years of experience monitoring and responding to incidents in a globally deployed web application.

- Strong experience with microservices architecture on Kubernetes.

- Deep understanding of observability tools and operational metrics (Grafana, Prometheus, P99, etc.

- Familiarity with AWS services or any major cloud provider.

- Excellent communication and customer service skills - must be able to clearly articulate status and updates to technical and non-technical stakeholders.

- Ability to ramp up quickly, take ownership, and work independently in a fast-pace.

Key Responsibilities :

- Monitor Grafana dashboards and observability tools to detect failures and performance issues.

- Act as the primary SRE for incident response, initiating reports from automated alerts or joining active incident channels.

- Serve as the main point of contact during incidents, delivering frequent updates to customers and incident commanders.

- Interpret operational metrics such as Quantiles, P99, and Prometheus data to assess system health.

- Track and manage permutations of a globally deployed microservices architecture running on Kubernetes.

- Collaborate with engineering and support teams to resolve issues quickly and efficiently.

- Maintain strong communication and customer service throughout incident lifecycles.

- Utilize foundational knowledge of AWS or other cloud platforms to support infrastructure monitoring.

- Ramp up quickly on existing systems and processes.

Why Join?

- Work with cutting-edge LLM infrastructure at Cisco.

- Full-time opportunity with Insight Global.

- Hybrid flexibility - onsite in Bangalore 3 days/week.

- Immediate interviews and onboarding.

- Competitive compensation.

info-icon

Did you find something suspicious?