HamburgerMenu
hirist

Principal/Chief Site Reliability Engineer - Observability Services

Collabera
Bangalore
15 - 16 Years

Posted on: 30/07/2025

Job Description

Job Description :

As a Principal/Chief Site Reliability Engineer, you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems.

Youll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof infrastructure.


Responsibilities :


- Design and implement scalable, highly available infrastructure and automation solutions.


- Drive adoption of SRE principles, SLAs, SLOs, and error budgets across teams.

- Proactively identify, debug, and resolve complex system reliability issues.

- Build tooling for observability, alerting, and performance monitoring.

- Collaborate with developers and architects on cloud-native design and service resilience.

- Conduct failure analysis, system audits, and root cause investigations.

- Contribute to strategic infrastructure decisions and reliability roadmaps.

- Promote influential leadership through mentorship and technical direction across teams.

- Work across multiple platforms and large-scale distributed systems.


Key Requirements :

- Experience: 15+ years in technology, with at least 5+ years in Site Reliability Engineering.

- Development Background: Strong hands-on experience in C/C++, Java, Go, or Python.

- Proven experience as a hands-on Individual Contributor (not a managerial role).

- Proficiency in scripting, system programming, and multi-platform architecture.

- Deep knowledge of:

a. Linux/Unix OS fundamentals.

b. Networking (DNS, TCP/IP, etc.

c. Cloud platforms (preferably AWS).

d. Observability and Monitoring Tools.

e. CI/CD and Infrastructure as Code.

- Strong exposure to SRE concepts: reliability, automation, on-call best practices, etc.

- System design, performance tuning, and troubleshooting large-scale systems.


info-icon

Did you find something suspicious?