Posted on: 09/12/2025
Description :
As a Senior Site Reliability Engineer, you will own and evolve the reliability, scalability, and performance of our global infrastructure.
You'll drive automation, build resilient distributed systems, define SRE maturity, and act as a force multiplier across engineering. This role blends hands-on architecture, operational excellence, and leadership.
Responsibilities :
- Design scalable, distributed systems supporting high availability and near-zero downtime.
- Build observability frameworks across cloud environments.
- Be the point person during outages, lead root-cause analysis, post-mortems and long-term fixes.
- Define and enforce SLA/SLO/SLI frameworks.
- Build IaC and automation pipelines (Terraform, Ansible, Jenkins/GitHub Actions).
- Eliminate manual ops and champion platform-level automation.
- Lead capacity planning, load testing, and system optimisation.
- Improve cloud efficiency and optimise cost without compromising reliability.
- Build secure deployment models, DR strategies, and backup frameworks.
- Ensure compliance with internal and external audit requirements.
- Partner with backend, platform, DevOps, and product teams.
- Mentor junior SREs and elevate reliability culture across engineering.
Requirements :
- 6+ years of experience in SRE/DevOps managing large-scale production systems.
- Expertise in GCP (AWS/Azure acceptable).
- Strong scripting experience (Python / Go / Bash).
- Deep understanding of Docker, Kubernetes and Helm.
- Hands-on with CI/CD: Jenkins, GitHub Actions, ArgoCD.
- Strong exposure to observability tools (Prometheus, Grafana, Datadog, NewRelic).
- Solid fundamentals in networking, load balancing, DNS, and caching.
- Strong automation mindset with expertise in IaC (Terraform, Ansible).
- Excellent debugging, operational rigour, and ownership mindset.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1587322
Interview Questions for you
View All