HamburgerMenu
hirist

Site Reliability Engineering Lead - Cloud Infrastructure

Arting Digital
7 - 9 Years
Chennai

Posted on: 19/03/2026

Job Description

Description :

Posting title : Site Reliability Engineering Lead

Experience : 7+ Years

Location : Chennai

Work mode : On-site

Primary skills : Terraform, Ansible, AWS services, Docker, Kubernetes, CI/CD, Datadog/ CloudWatch/ Prometheus, Python/Bash, SRE/ DevOps

Qualification : B.Tech / B.E. in Computer Science or MCA / M.Tech

Role Overview :

We are looking for an experienced Senior Site Reliability Engineer (SRE) / DevOps Lead to design, build, and maintain highly scalable, reliable, and secure cloud infrastructure. ?

Key Responsibilities :

- Design, implement, and manage scalable, secure, and highly available cloud infrastructure on AWS.

- Lead and mentor a team of engineers, fostering best practices in SRE and DevOps.

- Build and manage Infrastructure-as-Code (IaC) using tools like Terraform, AWS CDK, or CloudFormation.

- Develop and maintain CI/CD pipelines using tools such as Jenkins, GitHub Actions, or GitLab CI.

- Implement containerization and orchestration solutions using Docker, Kubernetes, ECS, or EKS.

- Establish monitoring, alerting, and observability frameworks using tools like Datadog, Prometheus, Grafana, ELK, or CloudWatch.

- Drive incident management, root cause analysis (RCA), and continuous improvement of system reliability.

- Design and implement disaster recovery (DR) and high-availability (HA) strategies.

- Optimize cloud costs using FinOps practices and cost monitoring tools.

- Collaborate with cross-functional teams to improve system performance, scalability, and security.

- Automate infrastructure, deployments, and operational workflows.

- Implement security best practices, including IAM, networking, and compliance standards.

- Lead platform-wide automation and reliability initiatives.

Required Skills & Qualifications :

- Bachelors or Masters degree in Computer Science, Engineering, or a related field.

- 7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.

- Minimum 2+ years of experience in a leadership, mentoring, or team management role.

- Strong hands-on experience with AWS services (EC2, S3, RDS, IAM, VPC, Lambda).

- Expertise in Infrastructure-as-Code (Terraform, AWS CDK, or CloudFormation).

- Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI).

- Proficiency in containerization and orchestration (Docker, Kubernetes, ECS/EKS).

- Strong experience with monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch).

- Solid scripting/programming skills (Python, Bash, or Go).

- Strong understanding of networking, cloud security, and identity/access management.

- Experience in designing high-availability and disaster recovery systems.

Preferred Qualifications :

- AWS or equivalent cloud certifications (Solutions Architect, DevOps Engineer).

- Experience with AIOps, serverless architectures, and event-driven systems.

- Familiarity with FinOps and cloud cost optimization frameworks.

- Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).

- Exposure to Atlassian tools (Jira, Confluence, Bitbucket).

- Experience working with SQL and NoSQL databases.

- Proven experience leading cross-functional reliability or automation initiatives.?


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in