HamburgerMenu
hirist

Site Reliability Engineer

HORIZONTAL INTEGRATION India Pvt Ltd
5 - 10 Years
Bangalore

Posted on: 11/02/2026

Job Description

Description :



Job Role : Site Reliability Engineer

Workmode : Hybrid

Location : Bengalore

Interview mode - F2F, only Immediate joiners preferred

We are looking for a passionate and driven Site Reliability Engineer to help build, operate, and scale highly reliable, low-latency production systems.

This role offers the opportunity to work alongside senior engineers in a fast-paced, high-availability environment, focusing on observability, automation, reliability engineering, and cloud transformation initiatives.

If you enjoy operating production systems, solving complex infrastructure problems, and reducing toil through automation this role is for you.

Key Responsibilities :


- Collaborate with engineering and product teams to ensure reliable and scalable system design


- Lead technical discussions and propose implementation strategies for reliability improvements


- Participate in incident response and on-call rotation


- Take ownership of minor production incidents and contribute to post-incident reviews


- Perform infrastructure-level application support :

a. Connectivity troubleshooting (port checks, firewall rules, VLAN checks)

b. Load balancer troubleshooting

c. Certificate management (renewals, CA creation, certificate deployment)


- Develop automation scripts using Python, Bash, or Perl


- Build multi-threaded automation scripts for scheduling and orchestration of applications


- API management create/invoke APIs, implement health checks


- Identify operational toil and eliminate it through automation


- Contribute to Disaster Recovery (DR) and resiliency testing initiatives


- Support migration of applications to Google Cloud Platform (GCP)


- Provision and deprovision GCE, GKE clusters


- Build and manage : Dockerized environments


- Jenkins pipelines for CI/CD deployments


- Ansible playbooks for parallel automation workflows


- Mentor L1 and L2 SRE team members


- Contribute reliability improvement ideas to product backlogs

Required Skills & Experience :


- Strong experience with Linux-based systems


- Hands-on experience with GCP (GCE, GKE) or other cloud platforms


- Strong scripting/programming skills (Python, Bash; multi-threading knowledge preferred)

- Good understanding of :

a. Application architectures

b. Messaging protocols

c. Distributed systems concepts


Knowledge of networking fundamentals :

a.TCP / UDP / IP

b. HTTP/HTTPS

c. Load balancing


- Experience with CI/CD tools like Jenkins

Hands-on experience with :

a. Docker

b. Kubernetes

c. Ansible


- Strong troubleshooting and analytical skills


- Experience handling production incidents


- Excellent communication and stakeholder collaboration skills

Good to Have :


- Experience with monitoring and observability tools :

a. OpenTelemetry

b. Splunk

c. Prometheus

d. Grafana


- Experience in high-availability or low-latency systems


- Exposure to financial/trading systems


- Experience working in Agile environments


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in