Posted on: 23/07/2025
We are looking for a passionate Site Reliability Engineer (SRE) with a strong application support background, a developers mindset, and a keen eye for performance and reliability.
You will play a crucial role in enhancing system performance, stability, and observability while automating IT operations and reducing toil.
Key Responsibilities :
- Design and implement SLAs, SLOs, SLIs, and enforce error budgets to improve application reliability.
- Monitor and optimize application performance and infrastructure metrics proactively.
- Configure and maintain observability tools to improve system monitoring, alerting, and logging.
- Analyze system architecture, identify risks, and develop mitigation strategies.
- Collaborate with engineering teams for system design reviews, capacity planning, and performance tuning.
- Conduct blameless postmortems for critical incidents and use learnings to prevent recurrence.
- Provide primary operational support for critical applications and manage incident resolution.
- Develop automated solutions to reduce manual efforts, implement self-healing mechanisms, and enforce resiliency patterns (e.g., circuit breaker, bulkhead).
- Apply analytics to historic incident and usage data to predict and prevent future failures.
Required Skills & Capabilities :
- 23 years of experience in Site Reliability Engineering or Application Support roles.
- Hands-on experience in building dashboards and alerts using Splunk and AppDynamics.
- Solid understanding of microservices architecture and distributed systems.
- Minimum of 2 years of experience developing web-based applications (preferably in Java, Spring Boot).
- Strong understanding of monitoring, observability, and system reliability principles.
- Basic hands-on experience in SQL and database interaction.
- Experience in incident management, root cause analysis, and capacity planning.
Preferred Qualifications :
- Bachelors or Masters degree in Computer Science, Engineering, or a related field (B.Tech / M.Tech).
- Familiarity with DevOps tools, CI/CD pipelines, and cloud infrastructure (AWS, Azure, or GCP) is a plus
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1518523
Interview Questions for you
View All