Posted on: 20/08/2025
We are looking for a dedicated Site Reliability Engineer (SRE) - Cloud Ops to join our team. In this role, you will play a key part in ensuring the stability and scalability of our cloud infrastructure. You will be responsible for monitoring, troubleshooting, and resolving infrastructure and application alerts, managing pipelines, and addressing environment-related issues in a dynamic 24/7 operational environment.
Responsibilities :
- Infrastructure Monitoring and Alert Response : Proactively monitor infrastructure and application alerts, ensuring prompt resolution to maintain uptime and performance.
- Shift-Based Operations : Work in a 24/7 environment with flexible availability for rotational shifts.
- Cloud Environment Management : Manage and resolve environment-related issues, focusing on stability and efficiency.
- Pipeline Management : Oversee CI/CD pipelines and ensure smooth deployment of updates and releases.
- Operational Tasks : Execute day-to-day operational activities, including incident management, change management, and maintaining operational excellence.
- Tool Management : Utilize tools like Kubernetes, PagerDuty, and GCP Cloud to support operational activities.
Requirements :
- B. E/B. Tech graduate with 2+ years of experience in Site Reliability, Cloud Ops, Monitoring, and Alerting.
- Expertise : In-depth knowledge of monitoring tools ( Prometheus, Grafana, ELK ), alert systems, and resolving related issues promptly.
- Kubernetes : Hands-on experience with Kubernetes for orchestration and container management.
- PagerDuty : Proficiency in setting up and managing alerting systems.
- Cloud Fundamentals : Basic understanding of GCP (Google Cloud Platform) services and operations.
- Incident Management : Strong problem-solving skills and experience in handling critical incidents under pressure.
- DevOps Processes : Basic knowledge of CI/CD pipelines, automation, and infrastructure-as-code practices.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1532171
Interview Questions for you
View All