Number of openings : 7

Role Overview :

You will architect and operate globally distributed infrastructure that powers high-traffic applications used by millions of users. The role demands deep expertise in cloud platforms, distributed systems, Kubernetes ecosystems, high-performance networking, and reliability engineering.

You will design platforms that allow engineering teams to deploy and scale services rapidly while maintaining strict SLO-driven reliability standards. This includes building infrastructure that automatically scales, self-recovers from failures, and operates with minimal human intervention.

Key Responsibilities :

- Architect multi-region, high-availability cloud infrastructure across AWS, GCP, or Azure

- Build scalable Kubernetes-based platforms supporting large microservice ecosystems

- Implement Infrastructure as Code using Terraform or Pulumi to automate provisioning

- Design GitOps-driven CI/CD systems enabling safe, zero-downtime deployments

- Build advanced observability platforms using Prometheus, OpenTelemetry, and Grafana

- Implement service mesh architectures using Istio / Envoy for secure service networking

- Define and enforce SLOs, SLIs, and error budgets to ensure reliability at scale

- Implement self-healing infrastructure, autoscaling systems, and chaos engineering practices

- Optimize infrastructure for low latency, high throughput, and cost efficiency

Core Technology Stack :

- Linux, Kubernetes, Docker, Terraform, Pulumi, AWS, GCP, Azure, Go, Python, eBPF, Istio, Envoy, Helm, ArgoCD, GitOps, Prometheus, OpenTelemetry, Grafana, HashiCorp Vault, OPA, distributed systems architecture, multi-region infrastructure, platform engineering, SRE practices, service networking, and edge infrastructure.