Posted on: 13/10/2025
Description :
Responsibilities :
- System Reliability : Achieve 99.95%+ uptime across all critical user-facing services.
- Performance : Maintain < 200ms P95 latency for all user-critical operations at 10x current scale.
- Incident Reduction : Implement monitoring and automation that reduces production incidents by 80%.
- Team Enablement : Create platform abstractions that let product teams ship features 3x faster.
- Cost Optimisation : Scale infrastructure costs sub-linearly with user growth through intelligent architecture.
- Requirements :
- 6+ years building and scaling distributed systems handling 5K concurrent requests in production.
- Expertise in distributed consensus, eventual consistency, and CAP theorem trade-offs in
production environments.
- Hands-on experience with microservices decomposition from monolithic architectures under live traffic.
- Production experience with service mesh technologies and container orchestration at scale.
Infrastructure and Performance Expertise :
- Experience in distributed caching strategies.
- Production experience with message queue architectures handling > 1M messages/day.
- Deep understanding of database sharding, replication, and distributed transaction management.
- Hands-on experience with CDN optimisation and edge computing for global/multi-region deployments.
- Proven ability to optimize systems achieving 100ms P99 latency for critical user-facing
operations.
- Experience with zero-trust network architectures and service-to-service authentication.
Did you find something suspicious?
Posted By
Posted in
Backend Development
Functional Area
Backend Development
Job Code
1560223
Interview Questions for you
View All