Posted on: 19/03/2026
Description :
Posting title : Site Reliability Engineering Lead
Experience : 7+ Years
Location : Chennai
Work mode : On-site
Primary skills : Terraform, Ansible, AWS services, Docker, Kubernetes, CI/CD, Datadog/ CloudWatch/ Prometheus, Python/Bash, SRE/ DevOps
Qualification : B.Tech / B.E. in Computer Science or MCA / M.Tech
Role Overview :
We are looking for an experienced Senior Site Reliability Engineer (SRE) / DevOps Lead to design, build, and maintain highly scalable, reliable, and secure cloud infrastructure. ?
Key Responsibilities :
- Design, implement, and manage scalable, secure, and highly available cloud infrastructure on AWS.
- Lead and mentor a team of engineers, fostering best practices in SRE and DevOps.
- Build and manage Infrastructure-as-Code (IaC) using tools like Terraform, AWS CDK, or CloudFormation.
- Develop and maintain CI/CD pipelines using tools such as Jenkins, GitHub Actions, or GitLab CI.
- Implement containerization and orchestration solutions using Docker, Kubernetes, ECS, or EKS.
- Establish monitoring, alerting, and observability frameworks using tools like Datadog, Prometheus, Grafana, ELK, or CloudWatch.
- Drive incident management, root cause analysis (RCA), and continuous improvement of system reliability.
- Design and implement disaster recovery (DR) and high-availability (HA) strategies.
- Optimize cloud costs using FinOps practices and cost monitoring tools.
- Collaborate with cross-functional teams to improve system performance, scalability, and security.
- Automate infrastructure, deployments, and operational workflows.
- Implement security best practices, including IAM, networking, and compliance standards.
- Lead platform-wide automation and reliability initiatives.
Required Skills & Qualifications :
- Bachelors or Masters degree in Computer Science, Engineering, or a related field.
- 7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles.
- Minimum 2+ years of experience in a leadership, mentoring, or team management role.
- Strong hands-on experience with AWS services (EC2, S3, RDS, IAM, VPC, Lambda).
- Expertise in Infrastructure-as-Code (Terraform, AWS CDK, or CloudFormation).
- Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI).
- Proficiency in containerization and orchestration (Docker, Kubernetes, ECS/EKS).
- Strong experience with monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch).
- Solid scripting/programming skills (Python, Bash, or Go).
- Strong understanding of networking, cloud security, and identity/access management.
- Experience in designing high-availability and disaster recovery systems.
Preferred Qualifications :
- AWS or equivalent cloud certifications (Solutions Architect, DevOps Engineer).
- Experience with AIOps, serverless architectures, and event-driven systems.
- Familiarity with FinOps and cloud cost optimization frameworks.
- Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).
- Exposure to Atlassian tools (Jira, Confluence, Bitbucket).
- Experience working with SQL and NoSQL databases.
- Proven experience leading cross-functional reliability or automation initiatives.?
Did you find something suspicious?
Posted by
HR
HR Associate at Arting Digital
Last Active: NA as recruiter has posted this job through third party tool.
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1621776