Posted on: 11/02/2026
Role Summary :
Reliability & Architecture :
- Define and enforce SLOs, SLIs, error budgets, and operational KPIs
- Design and review resilience patterns : circuit breakers, retries, rate limiting, graceful degradation
- Drive chaos engineering, fault-injection, and disaster-recovery readiness
Hands-on Engineering :
- Platform automation
- Observability integrations
- Review microservice architecture with engineering teams to eliminate single points of failure
Cloud & DevOps Leadership :
- Drive Kubernetes best practices (resource tuning, HPA, pod disruption budgets)
- Improve CI/CD pipelines for reliability, speed, and safety
Incident & Operations :
- Establish blameless postmortem culture
- Reduce MTTR through automation and better observability
- Participate in escalation/on-call strategy (not firefighting 247)
People & Process :
- Mentor SRE DevOps and SRE Full-Stack engineers
- Define operational standards, runbooks, and SRE practices
- Work closely with product, security, and engineering leaders
Required Skills & Experience :
- Strong hands-on experience with AWS (EKS, EC2, RDS, IAM, CloudWatch, ALB)
- Kubernetes & Docker
- Microservices architectures
- Strong programming background in Java and/or Node.js
- Deep understanding of distributed systems, production debugging, and capacity planning
- Experience in fintech or regulated environments is a strong plus
Nice to Have :
- Security & compliance exposure (PCI-DSS, SOC2, ISO)
- Prior experience building or scaling SRE teams
Did you find something suspicious?
Posted by
Methusa Maharachakan
Senior Executive Human Resources at VENZO TECHNOLOGIES PRIVATE LIMITED
Last Active: 11 Feb 2026
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1611907