Posted on: 19/11/2025
Note : If shortlisted, you will be invited for initial rounds on 6th December'25 (Saturday) in Bangalore
Description :
- Act as Incident Commander for Sev1/Sev2 events; run bridges/war rooms, drive parallel workstreams, and ensure clear decision logs and ownership.
- Assess business impact, prioritize recovery actions, and coordinate across Dev/SRE, Platform, Security, and Vendor teams.
- Issue timely internal/external communications (initial, updates, RCA/PIR) and maintain executive?ready status dashboards during incidents.
- Define and maintain Incident, Change, and Problem workflows, SOPs, and runbooks; ensure ITIL alignment and continuous improvement.
- Partner with Change Management to review risk, quality gates, and change freezes; reduce repeat incidents and change?related failures.
- Oversee access management during incidents (break?glass, least privilege) and conduct post?event access reviews.
- Own weekly/monthly reporting (KPIs/SLAs/SLOs, trend analysis, recurring faults) and drive corrective actions with owners and deadlines.
- Manage stakeholders and customers with calm, credible updates, action plans, and clear expectations.
- Capture lessons learned, update knowledge articles/runbooks, and coach teams on best practices.
Required Qualifications :
- Bachelors degree in IT, Computer Science, or related field.
- 6+ years in Incident Management within 247 global operations.
- ITIL Foundation certification (required); ITIL Intermediate/Managing Professional is a plus.
- Proven experience leading major incidents with multi team coordination and executive communication.
- Excellent written and verbal communication; able to articulate complex issues to technical and non?technical audiences.
- Strong analysis, prioritization, and decision?making under pressure.
Must Have Technical Knowledge :
- Kubernetes / AKS operations fundamentals.
- Windows & Linux operational basics.
- ServiceNow (Incident/Change/Problem) for ITSM processes.
- PagerDuty for on call management and escalation policies.
- Salesforce for customer case and communication workflows.
Key Skills & Competencies :
- Strong stakeholder management; ability to influence without authority.
- Structured, detail oriented, and excellent facilitation/bridge leadership.
- Comfortable working across time zones in a fast?moving environment.
KPIs Owned :
- MTTA/MTTR; incident volume and severity distribution.
- SLA/SLO adherence; repeat incident rate; problem backlog burn?down.
- Change failure rate linked to incidents; time to RCA/PIR closure.
- Timeliness and quality of communications; runbook coverage and freshness.
Desirable :
- Deeper Azure/Kubernetes/SRE experience (scaling, resiliency, observability).
- Advanced ITIL certifications (Change, Problem, Service Operations).
- Familiarity with monitoring/observability stacks (Prometheus/Grafana, Azure Monitor).
Additional Information:
- Weekend and on-call support may be required on a rotational basis.
- Coordination with customers and teams across multiple regions/time zones.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
IT Management / IT Support
Job Code
1577561
Interview Questions for you
View All