HamburgerMenu
hirist

Senior Site Reliability Engineer - DevOps & Production Support

Xohani Solutions Pvt. Ltd.
8 - 10 Years
Bangalore

Posted on: 01/04/2026

Job Description

Description :

- Utilizing broad full-stack knowledge and experience for proactive incident prevention by base-lining against expected service performance, improving processes from lessons learned, and using data analytics to identify problem areas and operational gaps.

- Leading product line agile teams in troubleshooting and resolving system problems, including analyzing application and critical system performance.

- Serving as a technical resource during critical and major incidents supporting multiple technologies.

- Facilitating SRE technical assessments, identifying gaps, and providing recommendations to product teams on SRE maturity journey plans based on Chevrons SRE framework.

- Finding opportunities to avoid future issues by improving logging and creating automated resolutions based on triggers. Developing automation scripts for repetitive tasks to eliminate toil/operations support activities.

- Overseeing production environments by monitoring availability and maintaining a holistic view of system health.

- Measuring and optimizing system performance, continuously seeking innovation and improvement to meet customer needs. Aligning, collaborating, and building relationships with peers, company leadership, subject matter experts, and users to enhance knowledge of end-to-end DevOps/Site Reliability Engineering best practices.

- Collaborating with SRE Community of Practice thought leaders to define SRE capabilities and best practices and integrate the capability framework throughout the organization.

Required Qualifications :

- Hands-on experience as an IT professional with knowledge of full-stack infrastructure and experience troubleshooting incidents and production issues.

- Working knowledge in several technology disciplines required for the full end-to-end service operations stack : network administration & security (CISCO/Juniper), identity & access management (Active Directory, Azure AD, SAML, OpenID Federation, certificates, and keys), cybersecurity, on-prem & cloud architecture, Windows & Linux OS, performance monitoring & management, troubleshooting (application & database), change management, API integration, and automation (Ansible, PowerShell, KQL, or Shell scripting).

- Strong communication (written/verbal) and facilitation skills with the solid ability to convey business and technical information to a diverse audience.

- Strong analytical and problem-solving skills with the ability to engage difficulties with persistence.

Preferred Qualifications :

- Basic understanding of the software development lifecycle and software engineering best practices, including code management (Git/GitHub) and CI/CD pipeline.

- Experience working in an agile team (Scrum/Kanban) is considered a plus.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in