HamburgerMenu
hirist

Site Reliability Lead - Cloud Services

Vikash Technologies
Others
7 - 12 Years

Posted on: 25/12/2025

Job Description

Description :

Hiring for SRE Lead

Exp : 7 - 12 yrs

Work Location : Mumbai ( Kurla West ) - WFO

Skills :

- Proficient in cloud platforms (AWS, Azure, or GCP), containerization (Kubernetes/Docker), and Infrastructure as Code (Terraform, Ansible, or Puppet).

- Coding/Scripting : Strong programming or scripting skills in at least one language (e.g., Python, Go, Java) for automation and tooling development.

- System Knowledge : Deep understanding of Linux/Unix fundamentals, networking concepts, and distributed systems.

Job Summary :

We are seeking an experienced Site Reliability Engineering (SRE) Lead to drive the reliability, scalability, and performance of our cloud-native platforms. The ideal candidate will combine strong software engineering skills with deep systems knowledge and hands-on experience in cloud infrastructure, automation, and modern DevOps practices. You will lead SRE initiatives and work closely with development and operations teams to ensure highly available and resilient systems.

Key Responsibilities :

- Lead and implement SRE practices to improve system reliability, availability, and performance.

- Design, build, and manage scalable infrastructure on AWS, Azure, or GCP.

- Implement and maintain containerized environments using Docker and Kubernetes.

- Develop and manage Infrastructure as Code (IaC) using Terraform, Ansible, or Puppet.

- Build automation tools and scripts to reduce manual operational work.

- Define and monitor SLIs, SLOs, and SLAs to ensure service reliability.

- Lead incident management, root cause analysis (RCA), and post-incident reviews.

- Collaborate with development teams to improve deployment, monitoring, and release processes.

- Ensure security, compliance, and best practices across infrastructure and operations.

- Mentor and guide SRE and DevOps team members.

Required Skills & Qualifications :

Cloud & Infrastructure :

- Strong hands-on experience with cloud platforms : AWS, Azure, or GCP.

- Extensive experience with containerization and orchestration using Docker and Kubernetes.

- Proven experience with Infrastructure as Code tools such as Terraform, Ansible, or Puppet.

Programming & Automation :

- Strong programming or scripting skills in at least one language such as Python, Go, or Java.

- Experience building automation frameworks and internal tooling.

Systems & Networking :

- Deep understanding of Linux/Unix systems, networking fundamentals, and system internals.

- Strong knowledge of distributed systems, scalability, and fault tolerance.

Preferred Skills :

- Experience with CI/CD pipelines and DevOps tooling.

- Hands-on experience with monitoring, logging, and alerting tools (Prometheus, Grafana, ELK, etc.).

- Knowledge of cloud security best practices.


info-icon

Did you find something suspicious?