HamburgerMenu
hirist

Allegion - Senior Site Reliability Engineer - CI/CD Pipeline

Posted on: 06/10/2025

Job Description

Job Description :

- Design, implement, and maintain highly available and scalable infrastructure systems, ensuring maximum uptime and performance.

- Collaborate with software engineering teams to build and deploy applications using best practices in reliability, scalability, and security.

- Develop and implement automation tools and frameworks to streamline operational processes, reduce manual intervention, and improve efficiency.

- Monitor and analyse system performance, identifying bottlenecks, and implementing solutions to optimize performance and scalability.

- Implement and maintain effective monitoring, alerting, and logging systems to proactively identify and resolve issues before they impact users.

- HandsOn Experience in building CI/CD automated pipelines using GitHUB Actions/Jenkins/GitLab or equivalent platform.

- Excellent in Automating workflows or solutions using Python/Go/Shell.

- Lead incident response and root cause analysis efforts, driving continuous improvement and preventing future incidents.

- Collaborate with cross-functional teams to define and enforce best practices, standards, and guidelines for system reliability and performance.

- Participate in on-call rotations and respond to incidents, ensuring timely resolution and minimal impact to users and thereby meeting SLAs.

- Plan and devise Disaster Recovery (DR) strategies and implement DR Plans.

- Mentor and provide guidance to junior team members, fostering a culture of learning and growth.

- Run the production environment by monitoring availability and taking a holistic view of system health.

- Build software and systems to manage platform infrastructure and applications.

- Improve reliability, quality, and time-to-market of our suite of software solutions.

- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.

- Provide primary operational support and engineering for multiple large-scale distributed software applications.

Required Knowledge, Skills And Abilities :

- Proven experience as a Site Reliability Engineer or similar role, with a focus on designing and maintaining highly available and scalable systems.

- Strong programming and scripting skills (Python, Bash, etc.) to automate operational tasks and develop tooling.

- Experience with cloud platforms (AWS) and containerization technologies (Docker, EKS).

- Proficient in configuration management tools like Ansible and infrastructure-as-code frameworks such as Terraform and CloudFormation.

- Experience with monitoring and logging tools (Prometheus, Grafana, Loki, Sentry.io, CloudWatch, etc.) for proactive system monitoring and troubleshooting.

- Ability to program (Structured and OOP) using one or more high-level languages, such as Java and JavaScript.

- Solid understanding of networking principles, protocols, and security best practices.

- Strong problem-solving skills and the ability to work effectively in a fast-paced, dynamic environment.

- Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams.

- Experience with distributed storage technologies such as NFS, Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn).

- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

- Experience in Agile methodologies.

- Strong skills in software design, design patterns.

- Experience in different architecture patterns like client-server/server less computing.

- Effective written, verbal and presentation skills with the ability to clearly articulate ideas and concepts.

- Self-directed and able to direct others.

Desired Skills & Abilities :

- Experience with setting up performance/load test environments.

- Familiarity with SOC2 audit processes.

Required Education And/or Experience :

- BE/B Tech/M Tech/MCA/MSc in Computer Science Engineering.

- 7 to 11 Years of experience in Software Application Development/CloudOps/SRE.


info-icon

Did you find something suspicious?