HamburgerMenu
hirist

Job Description

Job Description :


Key Responsibilities :


- Drive high levels of stability and availability of services driving Site Reliability Engineering as a practice across IPE.

- Grow partnership with Product Engineering owners, drive initiatives which benefit the team in accordance with SRE.

- 24-7 available as an escalation point for the operational teams.

- Reduced MTTR and service impact

- Address technical debt across IPE to remove risk

- Reduce recovery time on incidents

- Aid in major incidents which are owned by IPE.

- Validate service communications from technical perspective during major incidents

- Drive standard process and continual improvement for incident recovery, problem management, service resilience and availability

- Bring in best ITSM practices to evaluate and update existing practice as in creating Knowledge articles, Runbooks, and process documents.

- Responsible for IPE Technical Recovery and Problem Management response ensuring cross coordination across Technology Teams for complex, IPE owned issues.


- Accountable for technical decisions and communications on service recovery during live incidents.

- Reduce recovery time on incidents and act as the main contact point for Major Incidents.

- Collaborates with stakeholders to meet business objectives in Group IT initiatives by utilising in-depth knowledge of operations, processes and applications and contributes towards

- Identify trends and possible opportunities for Service Improvement Program (cross-domain/divisional), gain support and sponsorship then track and drive those program's through to conclusion providing regular service updates on progress.

- Responsible for oversight and governance of key resilience requirements for applications within IPE and address technical debt across IPE to remove risk.

Minimum Requirements :


- Bachelor's degree or equivalent experience in an IT related discipline preferred.

- Technical knowledge of SRE areas of focus - implementations with Datadog as an observability focus, Capacity management etc.

- Outstanding communication and influencing skills.

- Experience of industry best-practice processes and ability to drive approach and process changes.

- Initiative-taking, focused, and resilient, with a cheerful outlook.

- Good negotiation / influencing skills able to overcome resistance and reach consensus and compromise to attain the required objective.

- Demonstrated ability to manage time critical incident and recovery (crisis) situations and communication and liaison with internal stakeholders

- ITIL Foundation certificate must.

- Extensive experience with monitoring tools (e.g. Datadog, ITRS etc.)

info-icon

Did you find something suspicious?