Posted on: 23/02/2026
Job Title : Senior Site Reliability Engineer (SRE)
Location : Hyderabad / Ahmedabad
Employment Type : Full-Time
Work Model : 3 Days from office
Job Overview :
1. 6 - 10 years of SRE or infrastructure engineering experience in cloud-native environments.
2. Mandatory :
- Cloud : GCP (GKE, Load Balancing, VPN, IAM)
- Observability : Prometheus, Grafana, ELK, Datadog
- Containers & Orchestration : Kubernetes, Docker
- Incident Management : On-call, RCA, SLIs/SLOs
- IaC : Terraform, Helm
- Incident Tools : PagerDuty, OpsGenie
3. Nice to Have :
- Service Mesh, API Gateway
- GCP Spanner, MongoDB (basic)
Scope :
- Reduce MTTR, increase service availability
- Own incident and RCA processes
Roles and Responsibilities :
- Lead incident management for critical production issues - drive root cause analysis (RCA) and postmortems.
- Create and maintain runbooks and standard operating procedures for high availability services.
- Design and implement observability frameworks using ELK, Prometheus, and Grafana; drive telemetry adoption.
- Coordinate cross-functional war-room sessions during major incidents and maintain response logs.
- Develop and improve automated system recovery, alert suppression, and escalation logic.
- Use GCP tools like GKE, Cloud Monitoring, and Cloud Armor to improve performance and security posture.
- Collaborate with DevOps and Infrastructure teams to build highly available and scalable systems.
- Analyze performance metrics and conduct regular reliability reviews with engineering leads.
- Participate in capacity planning, failover testing, and resilience architecture reviews.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1615041