HamburgerMenu
hirist

Site Reliability Engineer - Configuration Management

Workassist
Chennai
10 - 15 Years
star-icon
4.8white-divider21+ Reviews

Posted on: 30/01/2026

Job Description

Description :

Role : Site Reliability Engineer (SRE)

Experience : 10-15 Years

Job Summary :

The Site Reliability Engineer (SRE) will play a critical role in ensuring the reliability, scalability, and performance of Citizens Banks enterprise systems and cloud environments.

The ideal candidate brings deep technical expertise across multi-cloud platforms, automation, observability, and incident management driving reliability engineering practices and operational excellence in a complex financial services environment.

Key Responsibilities :

- Manage and support cloud-based solutions across AWS, Azure, GCP, and other IaaS/PaaS/SaaS/CDN environments.

- Design, implement, and maintain reliable, scalable, and secure infrastructure, ensuring high availability and performance.

- Collaborate with DevOps and security teams to implement DevSecOps workflows using Git, Jenkins, Docker, Kubernetes (EKS/AKS).

- Automate infrastructure and configuration management using Terraform, Ansible, and scripting languages like Python, Bash, or PowerShell.

- Analyze traffic flows, system logs, and application events to troubleshoot issues and identify interdependencies across systems.

- Utilize monitoring and observability tools such as DataDog, Splunk, and CloudWatch for proactive system health management.

- Implement on-call support processes, develop and maintain runbook documentation, and work toward full automation of repetitive tasks.

- Collaborate with other SREs to build resilient systems and promote Site Reliability Engineering best practices across the enterprise.

- Handle critical application outages, perform root cause analysis, and drive incident resolution and preventive measures.

- Work within an Agile environment, partnering with cross-functional teams to continuously improve performance and reliability.

Technical Skills Required :

- Cloud Platforms : AWS, Azure, GCP

- DevOps/DevSecOps Tools : Jenkins, Git, Docker, Kubernetes (EKS, AKS)

- Infrastructure as Code (IaC) : Terraform, Ansible

- Monitoring & Logging : DataDog, Splunk, CloudWatch

- Scripting : Python, Bash, PowerShell

- Networking : TCP/IP, DNS, HTTP, Load Balancing, Routing

- OS Environments : Linux, Windows Server

- Familiarity with AMI builds, patching, and rehydration processes

Core Competencies :

- Strong analytical and troubleshooting skills

- Proven ability to drive incident response and post-incident reviews

- Excellent communication and stakeholder management

- Ability to collaborate in global, distributed teams

- Focus on automation, resilience, and continuous improvement


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in