Posted on: 02/12/2025
Description :
Role : Site Reliability Engineer (SRE)
Experience : 10 - 15 Years
Job Summary :
The Site Reliability Engineer (SRE) will play a critical role in ensuring the reliability, scalability, and performance of Citizens Banks enterprise systems and cloud environments.
The ideal candidate brings deep technical expertise across multi-cloud platforms, automation, observability, and incident management driving reliability engineering practices and operational excellence in a complex financial services environment.
Key Responsibilities :
- Manage and support cloud-based solutions across AWS, Azure, GCP, and other IaaS/PaaS/SaaS/CDN environments.
- Design, implement, and maintain reliable, scalable, and secure infrastructure, ensuring high availability and performance.
- Collaborate with DevOps and security teams to implement DevSecOps workflows using Git, Jenkins, Docker, Kubernetes (EKS/AKS).
- Automate infrastructure and configuration management using Terraform, Ansible, and scripting languages like Python, Bash, or PowerShell.
- Analyze traffic flows, system logs, and application events to troubleshoot issues and identify interdependencies across systems.
- Utilize monitoring and observability tools such as DataDog, Splunk, and CloudWatch for proactive system health management.
- Implement on-call support processes, develop and maintain runbook documentation, and work toward full automation of repetitive tasks.
- Collaborate with other SREs to build resilient systems and promote Site Reliability Engineering best practices across the enterprise.
- Handle critical application outages, perform root cause analysis, and drive incident resolution and preventive measures.
- Work within an Agile environment, partnering with cross-functional teams to continuously improve performance and reliability.
Technical Skills Required :
- Cloud Platforms : AWS, Azure, GCP
- DevOps/DevSecOps Tools : Jenkins, Git, Docker, Kubernetes (EKS, AKS)
- Infrastructure as Code (IaC) : Terraform, Ansible
- Monitoring & Logging : DataDog, Splunk, CloudWatch
- Scripting : Python, Bash, PowerShell
- Networking : TCP/IP, DNS, HTTP, Load Balancing, Routing
- OS Environments : Linux, Windows Server
- Familiarity with AMI builds, patching, and rehydration processes
Core Competencies :
- Strong analytical and troubleshooting skills
- Proven ability to drive incident response and post-incident reviews
- Excellent communication and stakeholder management
- Ability to collaborate in global, distributed teams
- Focus on automation, resilience, and continuous improvement
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1583600
Interview Questions for you
View All