Posted on: 13/10/2025
Description :
Position : DevOps SRE Manager
About the Role :
We are looking for a seasoned DevOps SRE Manager with GCP as the primary cloud platform and AWS as secondary. The ideal candidate will be responsible for leading a DevOps team managing GCP-based infrastructure and a CloudOps/SRE team ensuring 24x7 uptime for critical services.
This role requires a strong technical background in DevOps & SRE, leadership and team management skills, and the ability to own customer relationships while ensuring seamless cloud operations.
The candidate should have hands-on expertise with Terraform, Kubernetes (GKE), Prometheus, and Grafana while possessing working knowledge of AWS. They will play a crucial role in managing customer expectations, ensuring timely project deliveries, and driving operational excellence.
Key Responsibilities :
1. DevOps Management (GCP-Focused Infrastructure) :
- Own and oversee DevOps operations in a GCP environment using Terraform, Kubernetes (GKE), Prometheus, and Grafana.
- Ensure timely execution of DevOps tasks while optimizing infrastructure automation.
- Drive CI/CD pipeline enhancements and cloud security best practices.
- Enhance monitoring, logging, and alerting capabilities to improve system reliability.
- Optimize cloud costs, scalability, and security for long-term efficiency.
2. CloudOps / SRE Management (24x7 Support) :
- Manage and guide a 24x7 CloudOps/SRE team responsible for uptime and incident response.
- Create and maintain rosters to ensure continuous 24x7 support coverage.
- Oversee incident management, RCA (Root Cause Analysis), and SLAs.
- Implement observability best practices using Grafana, Prometheus, and Opsgenie.
- Reduce manual intervention by promoting automation and self-healing infrastructure.
3. Leadership & Team Management :
- Build and maintain strong customer relationships, ensuring clear and transparent communication.
- Lead and mentor a cross-functional team of DevOps and CloudOps/SRE engineers.
- Ensure team productivity, performance reviews, and professional growth.
- Drive continuous improvement through feedback, training, and best practices.
4. AWS (Good to Have) :
- Maintain basic to intermediate AWS knowledge (IAM, EC2, EKS, S3, Lambda, CloudFormation).
- Assist in AWS networking, security, and infrastructure optimization when required.
- Provide support for AWS-based workloads where integration with GCP exists.
Technical Stack Expertise Required :
Primary (GCP-Focused DevOps & CloudOps) :
- Cloud Platform : Google Cloud Platform (GCP) - Major, AWS-Minor
- Infrastructure as Code (IaC) : Terraform
- Containerization & Orchestration : Kubernetes (GKE)
- CI/CD & Automation : Jenkins, GitOps, Ansible
- Monitoring & Observability : Prometheus, Grafana
- Incident & Alerting Tools : Opsgenie
- Big Data & Streaming Technologies : Kafka, Airflow, Druid
- AWS Services : IAM, EC2, S3, Lambda, CloudFormation, CloudWatch
Required Skills & Qualifications :
- B.Tech/B.E. graduate with 10-15 years of experience in DevOps, CloudOps, or SRE roles
- Prior experience in handling 24x7 operations and multi-cloud environments.
- Proven experience in managing DevOps & CloudOps/SRE teams, ensuring smooth operations.
- Hands-on expertise with GCP infrastructure, Terraform, Kubernetes (GKE), and CI/CD pipelines.
- Experience in incident management, RCA, monitoring, and alerting tools (Prometheus, Grafana, Opsgenie).
- Strong understanding of reliability engineering, automation, and cloud security best practices.
Nice to have :
- Experience with Kafka, Airflow, and Druid in large-scale environments.
- Certifications : GCP Professional DevOps Engineer, AWS Solutions Architect, or Kubernetes certifications.
- Working knowledge of AWS cloud services, assisting in hybrid-cloud scenarios.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1559318
Interview Questions for you
View All