Posted on: 28/01/2026
Job Role : SRE with Azure
Experience : 6-10 Years
Location : (Hybrid) Bengaluru, Hyderabad
Andersen is hiring a Site Reliability Engineer with Microsoft Azure in India to drive reliability and performance for large-scale digital insurance platforms, enhancing integrations, optimizing cloud systems, and ensuring stable, high-quality service delivery.
The customer is a well-established global organization providing financial protection and risk-management services across various markets. With a diverse portfolio and teams operating in multiple regions, the company supports businesses and individuals through reliable, scalable solutions.
The project focuses on enhancing large-scale digital platforms, improving cloud performance, optimizing integrations, and modernizing systems to support efficient service delivery and ongoing expansion.
Responsibilities :
- Ensuring high availability, performance, and scalability of cloud infrastructure through proactive monitoring, automation, and continuous improvement.
- Designing and maintaining resilient Azure-based infrastructure using IaC (Terraform).
- Implementing end-to-end observability with telemetry, CUJ-level metrics, dashboards, alerts, and real-time performance insights.
- Monitoring Critical User Journeys with product and business teams to maintain a reliable user experience.
- Conducting load testing, capacity planning, and performance tuning to prepare systems for traffic growth and spikes.
- Managing SLIs, SLOs, SLAs, and error budgets across critical services.
- Implementing next-generation cloud reliability and fault-tolerance solutions, including disaster recovery improvements.
- Identifying risks and preventing service disruptions through proactive reliability engineering.
- Automating deployments, scaling, failover, and remediation to reduce manual toil and operational bottlenecks.
- Leading incident response, participating in on-call rotations, conducting root cause analysis, and delivering blameless post-mortems.
- Creating and maintaining runbooks, documentation, and operational guidelines.
- Collaborating with engineering and global teams on reliability best practices; mentoring junior SREs and supporting SRE hiring.
Must-haves :
- Experience as an SRE in cloud and infrastructure teams for 6+ years.
- Extensive experience with Microsoft Azure cloud services and infrastructure management for a minimum of 5+ years.
- Strong technical background with solid knowledge of software development principles, application production support, SDLC best practices, and Agile methodology.
- Hands-on SRE experience with a strong understanding ?? SLOs, SLIs, error budgets, incident management, and conducting blameless post-mortems.
- Strong ability to analyze and understand application architectures and identify areas for improvement.
- Experience working with monitoring, logging, and observability tools to assess and improve application performance.
- Proficiency in scripting and automation tools, including Python, Bash, and Terraform, to reduce toil and enhance operational efficiency.
- Strong incident response and troubleshooting skills with the ability to perform effective root cause analysis.
- Excellent communication and collaboration skills for working with cross-functional teams and clearly explaining technical concepts.
- Ability to coach and mentor team members in SRE practices and foster a culture of reliability.
- Practical experience applying Agile development practices and working in Agile teams.
- Proactive mindset focused on continuous improvement to increase system reliability and performance.
- Level of English from Intermediate+ and above.
Nice-to-haves:
- Additional certifications in cloud computing, DevOps, or SRE practices.
- Microsoft Azure certifications such as Azure Administrator, Azure DevOps Engineer, or Azure Solutions Architect.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1606831