HamburgerMenu
hirist

Lead Software Engineer - Site Reliability

TRANSREACH TALENT LLC
10 - 13 Years
Chennai

Posted on: 19/03/2026

Job Description

Description :

Role : Site Reliability Engineer (VLT/Evo Project)

Mission : To bridge the gap between development and operations by engineering systems that are self-healing, observable, and secure by design.


Key Responsibilities :

- Performance Engineering : Analyze VLT/Evo system bottlenecks. Its not just about measuring speed; it's about optimizing the kernel, network, or application stack to improve it.

- Service Level Management : Define, implement, and defend SLIs and SLOs. You will be the guardian of the Error Budget, helping the team decide when to push features vs. when to focus on stability.

- Observability Architecture : Design a holistic monitoring strategy using Prometheus/Loki/ELK to move from reactive alerting to predictive signals.


- Toil Reduction : Identify repetitive manual tasks and eliminate them through code. If you have to do it twice, automate it.


Technical Requirements :

- Observability Stack : Deep expertise in Prometheus, Grafana, and ELK/Loki.

- Automation : Professional-grade Python or Go (Go is increasingly the SRE standard) and robust Bash scripting.

- Infrastructure : Experience with Kubernetes or specialized Data Center orchestration.


Cultural Fit :

- A Blameless Post-mortem philosophy. You view every outage as a free lesson in system architecture.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in