Posted on: 19/03/2026
Description :
Role : Site Reliability Engineer (VLT/Evo Project)
Mission : To bridge the gap between development and operations by engineering systems that are self-healing, observable, and secure by design.
Key Responsibilities :
- Performance Engineering : Analyze VLT/Evo system bottlenecks. Its not just about measuring speed; it's about optimizing the kernel, network, or application stack to improve it.
- Service Level Management : Define, implement, and defend SLIs and SLOs. You will be the guardian of the Error Budget, helping the team decide when to push features vs. when to focus on stability.
- Observability Architecture : Design a holistic monitoring strategy using Prometheus/Loki/ELK to move from reactive alerting to predictive signals.
- Toil Reduction : Identify repetitive manual tasks and eliminate them through code. If you have to do it twice, automate it.
Technical Requirements :
- Observability Stack : Deep expertise in Prometheus, Grafana, and ELK/Loki.
- Automation : Professional-grade Python or Go (Go is increasingly the SRE standard) and robust Bash scripting.
- Infrastructure : Experience with Kubernetes or specialized Data Center orchestration.
Cultural Fit :
- A Blameless Post-mortem philosophy. You view every outage as a free lesson in system architecture.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1621860