Posted on: 15/01/2026
Location : Gurgaon-Hybrid.
About the Company :
Our systems handle high throughput, low latency, and 247 availability at scale.
Role Overview :
You will work closely with Engineering, SRE, and Security teams to ensure reliability, performance, and fast deployments.
Key Responsibilities :
- Manage compute, networking, storage, and security for high-traffic systems.
- Optimize infrastructure for low latency and high availability.
CI/CD & Automation :
- Automate deployments, rollbacks, and environment provisioning.
- Improve developer productivity through tooling and automation.
Reliability & Observability :
- Set up monitoring, logging, and alerting (Prometheus, Grafana, ELK, Datadog, etc.
- Perform root cause analysis (RCA) and incident management.
Scalability & Performance :
- Implement load balancing, autoscaling, and failover strategies.
Security & Compliance :
- Ensure secure deployments and access controls.
- Support compliance and data protection requirements.
Required Skills & Qualifications :
- Strong experience with Linux, Bash, and scripting.
- Expertise in Docker and Kubernetes.
- Experience with Terraform / CloudFormation / IaC tools.
- Hands-on experience with cloud platforms (AWS/GCP/Azure).
- Strong understanding of networking, DNS, load balancing.
- Experience managing high-scale distributed systems.
Good to Have :
- Exposure to Kafka, RabbitMQ, Redis.
- Experience with Erlang / Go / Java backend platforms.
- Knowledge of Zero-downtime deployments.
- SRE practices (SLIs, SLOs, error budgets).
Key Systems Youll Work With :
- Messaging servers & real-time gateways.
- Databases & caches.
- CI/CD pipelines.
- Monitoring & alerting platforms.
- Security and access management.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1601844