HamburgerMenu
hirist

Site Reliability Engineer - Observability Services

Teamware Solutions ( A division of Quantum Leap Co
Multiple Locations
5 - 8 Years
star-icon
4.1white-divider747+ Reviews

Posted on: 18/08/2025

Job Description

Role Summary :

We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on observability. The ideal candidate will have 5-8 years of experience in implementing and managing monitoring, logging, and alerting systems. This role requires expertise in the Kubernetes stack, as well as a solid foundation in coding and Infrastructure as Code to ensure the reliability and health of our systems.

Key Responsibilities :

- Observability Implementation: Design and implement comprehensive observability solutions, including monitoring, logging, and alerting.

- Kubernetes Stack Management: Work extensively with the Kubernetes stack and related tools such as Prometheus, Loki, Grafana, and Alert Manager to ensure system performance and reliability.

- Coding & Automation: Apply proficiency in Python & Go to solve complex problems, automate tasks, and contribute to the development of tools and systems.

- Infrastructure & CI/CD: Utilize Infrastructure as Code and manage CI/CD pipelines to ensure continuous and reliable deployments.

- Troubleshooting: Apply strong troubleshooting and problem-solving skills to diagnose and resolve issues efficiently and proactively.

Required Skills :

- Observability: Expertise in all aspects of observability, including Monitoring, Logging, and Alerting.

- Kubernetes Stack: Deep knowledge and hands-on experience with Prometheus, Loki, Grafana, and Alert Manager.

- Programming: Strong coding skills in Python & Go, sufficient for technical challenges.

- DevOps: Experience with CI/CD pipelines and Infrastructure as Code (IaC).

- Problem-Solving: Strong troubleshooting and problem-solving abilities.

- Cloud: Experience with AWS is mandatory.

Nice to Have Skills :

- Incident Management: Familiarity with PagerDuty.

- Integrations: Experience with the Zoom Developer Platform.

Education & Experience :

Education: A Bachelor's degree in Computer Science, Information Technology, or a related field is preferred.

Experience: A minimum of 5-8 years of experience in a Site Reliability or DevOps engineering role, with a focus on observability.


info-icon

Did you find something suspicious?