Posted on: 04/02/2026
Job Title : Senior Site Reliability Engineer
Experience : 4- 7 Years
Employment : Full-time
Location : Remote
Job Description :
We are looking for a Senior SRE to join our fully remote team to own the reliability, scalability, and performance of our global infrastructure. You won't just be "managing" clusters; you will be architecting multi-region resilience, building sophisticated CI/CD pipelines, and writing the Python automation that keeps our systems self-healing.
Technical Requirements :
- Kubernetes Mastery : Deep expertise in cluster architecture, RBAC, networking, and workload isolation. Experience with Helm, Operators, and scaling (HPA/Cluster Autoscaler) is essential.
- Programming & Scripting : Strong hands-on proficiency in Python. You should be comfortable writing scripts to interact with APIs and automate infrastructure.
- CI/CD & GitOps : Proven experience building and optimizing deployment pipelines using GitHub Actions.
- Observability Stack : Experience implementing monitoring and tracing for complex distributed systems (specifically K8s and Kafka).
- Architect Resilience : Design and operate multi-AZ and multi-region Kubernetes deployments, ensuring DR (Disaster Recovery) readiness and seamless cluster switchovers.
- Engineer for Reliability : Define SLIs, SLOs, and error budgets. Youll be the champion of "system health," building end-to-end observability (metrics, logs, traces) across K8s and Kafka.
- Automate Everything : This is not a "manual click" role. You will use Python and GitHub Actions to build robust CI/CD workflows and automate complex operational tasks.
- Lead through Incidents : Drive incident response and conduct blameless postmortems that result in long-term engineering fixes, not just temporary patches.
Why Join Us?
- 100% Remote : Work from wherever you are most productive.
- High Impact : Youll have a direct hand in defining the best practices and reliability standards for our entire engineering org.
- Complex Challenges : Solve high-scale problems involving Kafka, multi-region architecture, and high-availability systems.
Did you find something suspicious?
Posted by
Mohammed Rawoof
Sr. Talent Analyst at StatusNeo Technology Consulting Pvt. Ltd
Last Active: 5 Feb 2026
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1609775