Posted on: 20/08/2025
Job Overview :
The Sr. SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency. The role spans across Kubernetes, AWS, automation, incident response, and platform reliability.
Key Responsibilities :
- Build and maintain monitoring, logging, and alerting solutions.
- Lead incident response & post-mortem best practices.
- Design & test disaster recovery strategies.
- Collaborate with dev teams to define SLAs.
- Optimize cloud infra (AWS) for cost and performance.
- Automate deployments, scaling & recovery using Terraform, GitLab CI/CD, Kubernetes.
- Handle on-call support.
Required Skills & Experience
- 4+ years in SRE/DevOps.
- Proficiency in Shell, Chef, Ansible, Python.
- Strong AWS services experience (EC2, EKS, RDS, CloudWatch, Cognito, etc.).
- Kubernetes administration in production.
- IaC: Terraform / CloudFormation.
- Observability tools: Prometheus, Grafana, ELK, tracing systems.
- PostgreSQL (including replication).
- Networking, load balancing, security best practices.
- CI/CD pipelines & GitOps workflows.
- Ability to handle high-pressure incidents.
- Exposure to Splunk, Datadog, Dynatrace (plus point).
Preferred :
- AWS Certified Solutions Architect / DevOps Engineer.
- Certified Kubernetes Administrator (CKA).
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1532947
Interview Questions for you
View All