Posted on: 23/01/2026
Description :
Riverbed, the leader in AI observability, helps organizations optimize their users experiences by leveraging AI automation for the prevention, identification, and resolution of IT issues. With over 20 years of experience in data collection and AI and machine learning, Riverbeds open and AI-powered observability platform and solutions optimize digital experiences and greatly improves IT efficiency.
Riverbed also offers industry-leading Acceleration solutions that provide fast, agile, secure acceleration of any app, over any network, to users anywhere. Together with our thousands of market-leading customers globally including 95% of the FORTUNE 100 we are empowering next-generation digital experiences.
Position : Lead SRE Engineer
Location : Bangalore
Join Riverbed Technology and be part of shaping the future of digital experience management!
At Riverbed Technology, we are on a mission to help the worlds leading enterprises deliver superior digital experiences. Our Digital Experience Management (DEM) solutions provide deep visibility, AI-driven insights, and performance optimization across complex, global infrastructures.
We are expanding our Site Reliability Engineering (SRE) team and looking for an experienced SRE Lead in India to drive reliability, scalability, and operational excellence across our production environments. This is a unique opportunity to join a global company, lead technical initiatives, mentor engineers across Israel, the US, and beyond, and be instrumental in keeping Riverbed's SaaS solutions reliable and trusted by customers worldwide.
What You Will Do :
- Lead incident response and resolution coordinate investigations during critical production incidents, drive root cause analysis, and ensure rapid resolution.
- Architect and implement reliability solutions design and deploy infrastructure improvements, automation frameworks, and observability systems to prevent issues proactively.
- Own production stability initiatives - drive strategic projects that improve system resilience, reduce MTTR, and optimize infrastructure performance
- Mentor and guide SRE team members - provide technical leadership, conduct code/design reviews, and develop team capabilities
- Lead post-incident reviews and blameless postmortems - facilitate learning, document findings, and drive continuous improvement in incident response playbooks
- Collaborate with DevOps and Engineering leadership - partner with cross-functional teams to influence architectural decisions and reliability standards
- Establish and track SLIs/SLOs/SLAs - define reliability metrics, implement monitoring strategies, and drive data-driven operational improvements
- Participate in and help coordinate global on-call rotation - ensure continuous coverage and mentor team members on escalation procedures
What Makes You An Ideal Candidate :
- 4+ years of hands-on experience with AWS - expert-level knowledge of EC2, ECS, EKS, RDS, S3, VPC, Load Balancing, CloudFormation, and multi-account strategies
- Strong leadership and mentorship experience - proven track record of leading technical initiatives and developing engineering talent
- Expert-level proficiency in Linux systems administration and performance tuning
- Advanced experience with infrastructure-as-code - Terraform and Ansible in production environments at scale
- Deep expertise in container orchestration - Kubernetes (K8S) and ECS, including cluster management, scaling strategies, and troubleshooting
- Strong CI/CD pipeline design and implementation experience (Jenkins, GitLab CI, or similar)
- Advanced knowledge of observability stack - CloudWatch, Prometheus, Grafana, ELK/EFK, Datadog, or equivalent platforms
- Expert networking skills - DNS, load balancing, TLS/SSL, VPNs, service mesh architectures, and complex connectivity troubleshooting
- Automation and scripting proficiency - Python, Bash, or Go for building tools and automation frameworks
- Excellent communication and technical documentation skills - able to clearly articulate complex technical concepts to both technical and non-technical stakeholders
- Experience with DORA metrics and SRE best practices - understanding of error budgets, toil reduction, and reliability engineering principles
Nice to Have :
- Background in security and compliance (SOC2, ISO, FedRAMP)
- Contributions to open-source SRE/DevOps projects
- Experience with multi-region, high-availability architectures
- Knowledge of FinOps and cloud cost optimization at scale
- Familiarity with GitOps practices (ArgoCD, Flux)
Did you find something suspicious?
Posted by
Girish Valecha
Lead - HR and Learning Operations at Riverbed Technology India Private Limited
Last Active: 23 Jan 2026
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1605209