Posted on: 18/08/2025
Role Summary :
We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on observability. The ideal candidate will have 5-8 years of experience in implementing and managing monitoring, logging, and alerting systems. This role requires expertise in the Kubernetes stack, as well as a solid foundation in coding and Infrastructure as Code to ensure the reliability and health of our systems.
Key Responsibilities :
- Observability Implementation: Design and implement comprehensive observability solutions, including monitoring, logging, and alerting.
- Kubernetes Stack Management: Work extensively with the Kubernetes stack and related tools such as Prometheus, Loki, Grafana, and Alert Manager to ensure system performance and reliability.
- Coding & Automation: Apply proficiency in Python & Go to solve complex problems, automate tasks, and contribute to the development of tools and systems.
- Infrastructure & CI/CD: Utilize Infrastructure as Code and manage CI/CD pipelines to ensure continuous and reliable deployments.
- Troubleshooting: Apply strong troubleshooting and problem-solving skills to diagnose and resolve issues efficiently and proactively.
Required Skills :
- Observability: Expertise in all aspects of observability, including Monitoring, Logging, and Alerting.
- Kubernetes Stack: Deep knowledge and hands-on experience with Prometheus, Loki, Grafana, and Alert Manager.
- Programming: Strong coding skills in Python & Go, sufficient for technical challenges.
- DevOps: Experience with CI/CD pipelines and Infrastructure as Code (IaC).
- Problem-Solving: Strong troubleshooting and problem-solving abilities.
- Cloud: Experience with AWS is mandatory.
Nice to Have Skills :
- Incident Management: Familiarity with PagerDuty.
- Integrations: Experience with the Zoom Developer Platform.
Education & Experience :
Education: A Bachelor's degree in Computer Science, Information Technology, or a related field is preferred.
Experience: A minimum of 5-8 years of experience in a Site Reliability or DevOps engineering role, with a focus on observability.
Did you find something suspicious?
Posted By
Pavithara M
Talent Acquisition Executive at Teamware Solutions ( A division of Quantum Leap Co
Last Active: 29 Aug 2025
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1531328
Interview Questions for you
View All