HamburgerMenu
hirist

Site Reliability Engineer III - System Architecture

HyreSnap
Bangalore
6 - 9 Years
star-icon
4.5white-divider16+ Reviews

Posted on: 19/09/2025

Job Description

Responsibilities :


- Architect and lead the design of scalable, reliable infrastructure solutions.


- Implement strategies for high availability, scalability, and low-latency performance.


- Define service-level objectives (SLOs) and service-level indicators (SLIs) to track performance and reliability.


- Drive incident management by identifying root causes and providing long-term solutions.


- Mentor junior engineers and foster a collaborative, learning-focused environment.


- Design advanced monitoring and alerting systems for proactive system management.


- Architect and optimize network topologies (hybrid cloud, multi-cloud, and on-prem) to support ultra-low-latency trading and compliance-driven workloads.


- Configure and manage cloud and on-prem networking components (VPCs, Shared VPCs, Private Service Connect, Cloud NAT, and Global Load Balancers for secure and compliant transaction flows.


- Implement secure connectivity solutions (VPNs, Interconnect, Direct Connect, and service meshes) to meet fintech regulatory requirements and standards.


- Develop and maintain DNS, load-balancing, and traffic-routing strategies to ensure millisecond-level latency for real-time transactions.


- Evolve Infrastructure as Code (IaC) practices and principles to automate infrastructure provisioning.


- Collaborate on reliability roadmaps, performance benchmarks, and disaster recovery plans tailored for low-latency and high-throughput workloads.


- Manage Kubernetes clusters at scale, integrating service meshes like Istio or Linkerd.


- Implement chaos engineering principles to strengthen system resilience.


- Influence technical direction, reliability culture, and organizational strategies.


Requirements :

- 6-9 years of experience in SRE, DevOps, or system architecture roles with large-scale production systems.


- Extensive experience managing and scaling high-traffic, low-latency fintech systems, ensuring reliability, compliance, and secure transaction processing.


- Proven expertise in the networking stack, with hands-on experience in BGP, OSPF, DNS, HTTP(S), TCP/IP, MPLS, and VPN protocols.


- Advanced knowledge of GCP networking (VPC design, Shared VPC, Private Service Connect, Global Load Balancers, Cloud DNS, Cloud NAT, Network Intelligence Center, and Service Mesh).


- Strong background in managing complex multi-cloud environments (AWS, GCP, Azure) with a focus on secure and compliant architectures in regulated industries.


- Hands-on expertise in Terraform and Infrastructure-as-Code (IaC) for repeatable, automated deployments.


- Expertise in Kubernetes, container orchestration, and microservices, with production experience in regulated fintech environments.


- Advanced programming and scripting skills in Python, Go, or Java, applied to automation, risk reduction, and financial system resilience.


- Proficiency with monitoring and logging tools (Prometheus, Mimir, Grafana, Loki) to ensure real-time visibility into trading, payments, and transaction flows.


- Strong understanding of networking, load balancing, and DNS management across multi-cloud and hybrid infrastructures.


- Implemented end-to-end observability solutions (metrics, logs, and traces) to monitor and optimize transaction throughput, adhering to latency SLAs.


- Leadership skills with experience mentoring teams, fostering a culture of reliability, and partnering with cross-functional stakeholders in product teams.


- Strong communication, critical thinking, and incident management abilities, especially in high-stakes production incidents involving customer transactions.


- Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.



info-icon

Did you find something suspicious?