Posted on: 20/01/2026
Description :
Role : SRE Team Lead
Experience : 8 - 12 years
Domain : Fintech | Microservices | Cloud-native
Role Summary :
We are looking for a hands-on SRE Team Lead to own the reliability, scalability, and operational excellence of a cloud-native fintech platform built on microservices. This role combines technical leadership, architecture ownership, and deep hands-on execution.
You will lead a small SRE team while remaining actively involved in design, coding, incident response, and reliability engineering.
Key Responsibilities :
Reliability & Architecture :
- Own platform availability, latency, scalability, and resilience across environments
- Define and enforce SLOs, SLIs, error budgets, and operational KPIs
- Design and review resilience patterns : circuit breakers, retries, rate limiting, graceful degradation
- Drive chaos engineering, fault-injection, and disaster-recovery readiness
Hands-on Engineering :
- Actively contribute code (Java / Node) for :
1. Reliability tooling
2. Platform automation
3. Observability integrations
- Review microservice architecture with engineering teams to eliminate single points of failure
Cloud & DevOps Leadership :
- Own AWS architecture (VPCs, IAM, EKS, RDS, ALB/NLB, autoscaling)
- Drive Kubernetes best practices (resource tuning, HPA, pod disruption budgets)
- Improve CI/CD pipelines for reliability, speed, and safety
Incident & Operations :
- Lead production incident response, root cause analysis (RCA), and postmortems
- Establish blameless postmortem culture
- Reduce MTTR through automation and better observability
- Participate in escalation/on-call strategy (not firefighting 247)
People & Process :
- Mentor SRE DevOps and SRE Full-Stack engineers
- Define operational standards, runbooks, and SRE practices
- Work closely with product, security, and engineering leaders
Required Skills & Experience :
- 8+ years of experience in SRE / Platform / DevOps engineering
- Strong hands-on experience with :
1. AWS (EKS, EC2, RDS, IAM, CloudWatch, ALB)
2. Kubernetes & Docker
3. Microservices architectures
- Strong programming background in Java and/or Node.js
- Deep understanding of :
1. Distributed systems
2. Production debugging
3. Capacity planning
- Experience in fintech or regulated environments is a strong plus
Nice to Have :
- Experience with chaos engineering tools
- Security & compliance exposure (PCI-DSS, SOC2, ISO)
- Prior experience building or scaling SRE teams
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1603618