Posted on: 07/08/2025
Job Title : SRE2
Location : Bengaluru, Karnataka
What you will do :
- Design, write and build tools to improve the reliability, latency, availability and scalability.
- Engender reliability and availability starting with metrics and measurements
- Enable scaling by providing tools, developing training and/or augmenting processes
- Build tools/automate to prevent re-occurrence of problems in mission critical products/services.
- Engages with the development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.
- Dynamically manage workload of the SRE team, drive and deliver on multiple priorities simultaneously
- Provide thought leadership in architecture, design, product features and provide feedback on products built on a variety of platforms
- Design, code, test, and deliver software to automate manual operational work
- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
- Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
- Identify application patterns and analytics in support of better service level objectives
- Design self-healing and resiliency patterns
- Design automated software and product upgrades, change management, and release management solutions
- Coach or manage teams as applicable
- Participate in the 24x7 support coverage as needed
- Should be self-motivated and willing to work under minimum surveillance
Who you are :
- Bachelor's degree or equivalent experience in an software engineering discipline
- 5 to 7 years of experience.
- Experience in Software development in one or more of the following programming language is must : Python/go,
- Expertise in at least one technology stack designing, coding, testing, and delivering software
- Experience in Distributed computing.
- Strong experience in designing and building highly available high-volume messaging infrastructure with Apache Kafka on AWS and On-prem (e.g. stretch cluster, active/active or active/passive) using Mirror Maker or other replication tools.
- Good experience with Schema Registry, Kafka connectors (source and sink) and KSQL, have worked with Kafka brokers, Zookeeper, Topics, connectors for Setup and administration.
- Strong experience in Enterprise Redis, cluster setup, administration, reliability and observability.
- Strong experience in setting up monitoring and management with tools.
- Working knowledge of monitoring, management tools and data growth management.
- Devops Tools experience in Jenkins/Ansible/Git workflows / CICD
- Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm
- Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
- Excellent debugging and troubleshooting skills.
- Experience with infrastructure provisioning tools like Terraform or Ansible.
- Hands-on experience deploying and operating applications using IaaS and PaaS Amazon AWS.
Did you find something suspicious?
Posted By
Recruiter
Last Active: NA as recruiter has posted this job through third party tool.
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1525357
Interview Questions for you
View All