HamburgerMenu
hirist

Job Description

Description :

Position : DevOps SRE Manager

About the Role :


We are looking for a seasoned DevOps SRE Manager with GCP as the primary cloud platform and AWS as secondary. The ideal candidate will be responsible for leading a DevOps team managing GCP-based infrastructure and a CloudOps/SRE team ensuring 24x7 uptime for critical services.

This role requires a strong technical background in DevOps & SRE, leadership and team management skills, and the ability to own customer relationships while ensuring seamless cloud operations.

The candidate should have hands-on expertise with Terraform, Kubernetes (GKE), Prometheus, and Grafana while possessing working knowledge of AWS. They will play a crucial role in managing customer expectations, ensuring timely project deliveries, and driving operational excellence.

Key Responsibilities :


1. DevOps Management (GCP-Focused Infrastructure) :

- Own and oversee DevOps operations in a GCP environment using Terraform, Kubernetes (GKE), Prometheus, and Grafana.

- Ensure timely execution of DevOps tasks while optimizing infrastructure automation.

- Drive CI/CD pipeline enhancements and cloud security best practices.

- Enhance monitoring, logging, and alerting capabilities to improve system reliability.

- Optimize cloud costs, scalability, and security for long-term efficiency.

2. CloudOps / SRE Management (24x7 Support) :

- Manage and guide a 24x7 CloudOps/SRE team responsible for uptime and incident response.

- Create and maintain rosters to ensure continuous 24x7 support coverage.

- Oversee incident management, RCA (Root Cause Analysis), and SLAs.

- Implement observability best practices using Grafana, Prometheus, and Opsgenie.

- Reduce manual intervention by promoting automation and self-healing infrastructure.

3. Leadership & Team Management :

- Build and maintain strong customer relationships, ensuring clear and transparent communication.

- Lead and mentor a cross-functional team of DevOps and CloudOps/SRE engineers.

- Ensure team productivity, performance reviews, and professional growth.

- Drive continuous improvement through feedback, training, and best practices.

4. AWS (Good to Have) :

- Maintain basic to intermediate AWS knowledge (IAM, EC2, EKS, S3, Lambda, CloudFormation).

- Assist in AWS networking, security, and infrastructure optimization when required.

- Provide support for AWS-based workloads where integration with GCP exists.

Technical Stack Expertise Required :


Primary (GCP-Focused DevOps & CloudOps) :

- Cloud Platform : Google Cloud Platform (GCP) - Major, AWS-Minor

- Infrastructure as Code (IaC) : Terraform

- Containerization & Orchestration : Kubernetes (GKE)

- CI/CD & Automation : Jenkins, GitOps, Ansible

- Monitoring & Observability : Prometheus, Grafana

- Incident & Alerting Tools : Opsgenie

- Big Data & Streaming Technologies : Kafka, Airflow, Druid

- AWS Services : IAM, EC2, S3, Lambda, CloudFormation, CloudWatch

Required Skills & Qualifications :

- B.Tech/B.E. graduate with 10-15 years of experience in DevOps, CloudOps, or SRE roles

- Prior experience in handling 24x7 operations and multi-cloud environments.

- Proven experience in managing DevOps & CloudOps/SRE teams, ensuring smooth operations.

- Hands-on expertise with GCP infrastructure, Terraform, Kubernetes (GKE), and CI/CD pipelines.

- Experience in incident management, RCA, monitoring, and alerting tools (Prometheus, Grafana, Opsgenie).

- Strong understanding of reliability engineering, automation, and cloud security best practices.

Nice to have :

- Experience with Kafka, Airflow, and Druid in large-scale environments.

- Certifications : GCP Professional DevOps Engineer, AWS Solutions Architect, or Kubernetes certifications.

- Working knowledge of AWS cloud services, assisting in hybrid-cloud scenarios.


info-icon

Did you find something suspicious?