Job Title : SRE2

Location : Bengaluru, Karnataka

What you will do :

- Design, write and build tools to improve the reliability, latency, availability and scalability.

- Engender reliability and availability starting with metrics and measurements

- Enable scaling by providing tools, developing training and/or augmenting processes

- Build tools/automate to prevent re-occurrence of problems in mission critical products/services.

- Engages with the development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.

- Dynamically manage workload of the SRE team, drive and deliver on multiple priorities simultaneously

- Provide thought leadership in architecture, design, product features and provide feedback on products built on a variety of platforms

- Design, code, test, and deliver software to automate manual operational work

- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents

- Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes

- Identify application patterns and analytics in support of better service level objectives

- Design self-healing and resiliency patterns

- Design automated software and product upgrades, change management, and release management solutions

- Coach or manage teams as applicable

- Participate in the 24x7 support coverage as needed

- Should be self-motivated and willing to work under minimum surveillance

Who you are :

- Bachelor's degree or equivalent experience in an software engineering discipline

- 5 to 7 years of experience.

- Experience in Software development in one or more of the following programming language is must : Python/go,

- Expertise in at least one technology stack designing, coding, testing, and delivering software

- Experience in Distributed computing.

- Strong experience in designing and building highly available high-volume messaging infrastructure with Apache Kafka on AWS and On-prem (e.g. stretch cluster, active/active or active/passive) using Mirror Maker or other replication tools.

- Good experience with Schema Registry, Kafka connectors (source and sink) and KSQL, have worked with Kafka brokers, Zookeeper, Topics, connectors for Setup and administration.

- Strong experience in Enterprise Redis, cluster setup, administration, reliability and observability.

- Strong experience in setting up monitoring and management with tools.

- Working knowledge of monitoring, management tools and data growth management.

- Devops Tools experience in Jenkins/Ansible/Git workflows / CICD

- Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm

- Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)

- Excellent debugging and troubleshooting skills.

- Experience with infrastructure provisioning tools like Terraform or Ansible.

- Hands-on experience deploying and operating applications using IaaS and PaaS Amazon AWS.