- Architect and optimize network topologies (hybrid cloud, multi-cloud, and on-prem) to support ultra-low-latency trading and compliance-driven workloads.

- Configure and manage cloud and on-prem networking components (VPCs, Shared VPCs, Private Service Connect, Cloud NAT, and Global Load Balancers for secure and compliant transaction flows.

- Implement secure connectivity solutions (VPNs, Interconnect, Direct Connect, and service meshes) to meet fintech regulatory requirements and standards.

- Develop and maintain DNS, load-balancing, and traffic-routing strategies to ensure millisecond-level latency for real-time transactions.

- Evolve Infrastructure as Code (IaC) practices and principles to automate infrastructure provisioning.

- Collaborate on reliability roadmaps, performance benchmarks, and disaster recovery plans tailored for low-latency and high-throughput workloads.

- Manage Kubernetes clusters at scale, integrating service meshes like Istio or Linkerd.

- Implement chaos engineering principles to strengthen system resilience.

- Influence technical direction, reliability culture, and organizational strategies.

Requirements :

- 6-9 years of experience in SRE, DevOps, or system architecture roles with large-scale production systems.

- Extensive experience managing and scaling high-traffic, low-latency fintech systems, ensuring reliability, compliance, and secure transaction processing.

- Proven expertise in the networking stack, with hands-on experience in BGP, OSPF, DNS, HTTP(S), TCP/IP, MPLS, and VPN protocols.

- Advanced knowledge of GCP networking (VPC design, Shared VPC, Private Service Connect, Global Load Balancers, Cloud DNS, Cloud NAT, Network Intelligence Center, and Service Mesh).

- Strong background in managing complex multi-cloud environments (AWS, GCP, Azure) with a focus on secure and compliant architectures in regulated industries.

- Hands-on expertise in Terraform and Infrastructure-as-Code (IaC) for repeatable, automated deployments.

- Expertise in Kubernetes, container orchestration, and microservices, with production experience in regulated fintech environments.

- Advanced programming and scripting skills in Python, Go, or Java, applied to automation, risk reduction, and financial system resilience.

- Proficiency with monitoring and logging tools (Prometheus, Mimir, Grafana, Loki) to ensure real-time visibility into trading, payments, and transaction flows.

- Strong understanding of networking, load balancing, and DNS management across multi-cloud and hybrid infrastructures.

- Implemented end-to-end observability solutions (metrics, logs, and traces) to monitor and optimize transaction throughput, adhering to latency SLAs.

- Leadership skills with experience mentoring teams, fostering a culture of reliability, and partnering with cross-functional stakeholders in product teams.

- Strong communication, critical thinking, and incident management abilities, especially in high-stakes production incidents involving customer transactions.

- Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.