Posted on: 28/10/2025
About the Role :
We are looking for a highly experienced Senior Staff Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will bring deep technical expertise in DevOps, automation, and large-scale distributed systems, with a strong understanding of cloud operations and CI/CD frameworks. Experience in the telecom domain will be an added advantage.
Key Responsibilities :
- Design, build, and maintain scalable, reliable, and secure cloud infrastructure.
- Develop automation solutions to improve system performance, availability, and operational efficiency.
- Implement, manage, and optimize CI/CD pipelines and deployment strategies.
- Monitor, troubleshoot, and resolve complex production issues across distributed systems.
- Collaborate closely with development and operations teams to drive site reliability best practices.
- Develop observability frameworks and tools to ensure visibility into system health and performance.
- Lead incident management, post-mortems, and continuous improvement initiatives.
- Mentor junior engineers and promote a culture of automation and reliability engineering.
Requirements :
- Bachelors degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).
- 8+ years of experience in DevOps or Site Reliability Engineering (SRE) roles.
- Proven expertise in managing and scaling large distributed systems.
- Strong background in cloud operations (AWS preferred), automation, and CI/CD.
Technical Skills :
- Cloud: AWS (EKS, EC2, RDS, IAM, VPC, Kafka, CloudWatch, API Gateway, Lambda, WAF, KMS).
- Infrastructure as Code: Terraform, Jenkins, Git.
- Scripting: Python, Bash.
- Monitoring & Observability: Grafana, Elastic Stack, Prometheus.
- Containerization & Orchestration: Kubernetes, Docker, microservices reliability.
- Strong understanding of Linux administration and networking fundamentals.
Preferred Certifications :
- AWS Certified DevOps Engineer / Solutions Architect Associate (preferred).
- Terraform Associate or Kubernetes Certified Administrator (CKA) (a plus).
- SRE Foundation or Google SRE Certification (desirable).
Why Join Us :
- Work on cutting-edge distributed systems and cloud infrastructure.
- Opportunity to lead high-impact initiatives in a fast-paced, technology-driven environment.
- Collaborate with cross-functional teams in a culture that values innovation, ownership, and growth.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1566339
Interview Questions for you
View All