HamburgerMenu
hirist

Angel One - Site Reliability Engineer - Monitoring Tools

Posted on: 07/08/2025

Job Description

Job Title : SRE2

Location : Bengaluru, Karnataka

What you will do :

- Design, write and build tools to improve the reliability, latency, availability and scalability.


- Engender reliability and availability starting with metrics and measurements


- Enable scaling by providing tools, developing training and/or augmenting processes


- Build tools/automate to prevent re-occurrence of problems in mission critical products/services.


- Engages with the development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.


- Dynamically manage workload of the SRE team, drive and deliver on multiple priorities simultaneously


- Provide thought leadership in architecture, design, product features and provide feedback on products built on a variety of platforms


- Design, code, test, and deliver software to automate manual operational work


- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents


- Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes


- Identify application patterns and analytics in support of better service level objectives

- Design self-healing and resiliency patterns


- Design automated software and product upgrades, change management, and release management solutions


- Coach or manage teams as applicable


- Participate in the 24x7 support coverage as needed


- Should be self-motivated and willing to work under minimum surveillance

Who you are :

- Bachelor's degree or equivalent experience in an software engineering discipline


- 5 to 7 years of experience.


- Experience in Software development in one or more of the following programming language is must : Python/go,


- Expertise in at least one technology stack designing, coding, testing, and delivering software


- Experience in Distributed computing.


- Strong experience in designing and building highly available high-volume messaging infrastructure with Apache Kafka on AWS and On-prem (e.g. stretch cluster, active/active or active/passive) using Mirror Maker or other replication tools.


- Good experience with Schema Registry, Kafka connectors (source and sink) and KSQL, have worked with Kafka brokers, Zookeeper, Topics, connectors for Setup and administration.


- Strong experience in Enterprise Redis, cluster setup, administration, reliability and observability.


- Strong experience in setting up monitoring and management with tools.


- Working knowledge of monitoring, management tools and data growth management.

- Devops Tools experience in Jenkins/Ansible/Git workflows / CICD


- Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm


- Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)


- Excellent debugging and troubleshooting skills.


- Experience with infrastructure provisioning tools like Terraform or Ansible.


- Hands-on experience deploying and operating applications using IaaS and PaaS Amazon AWS.

info-icon

Did you find something suspicious?