- Movius AI-powered solutions enable businesses to build strong and lasting relationships with their customers in a company-owned, controllable system.

- Welcome to Phone 3.0.

- Headquartered in Alpharetta, GA, with offices in Silicon Valley, Bangalore, India, New York, and London, Movius partners with leading global wireless carriers like T-Mobile, Vodafone, TELUS, BT, Singtel & more.

Your Opportunity :

- We are looking for a Senior Staff Site Reliability Engineer (SRE) with strong technical expertise in distributed systems, cloud infrastructure, observability, and automation.

- In this role, you will be responsible for improving the reliability, scalability, and performance of our production and pre-production systems.

- You will work hands-on in designing and implementing SRE frameworks, automating key reliability workflows, and building a culture of operational excellence.

- You will also work closely with product engineering, QA, and DevOps teams to define SLOs/SLIs, enhance monitoring and alerting, and strengthen our overall reliability practices.

What Youll Do :

Reliability Engineering & Architecture :

- Design and maintain highly available, fault-tolerant systems on AWS.

- Implement service reliability models based on SLOs, SLIs, and error budgets.

- Continuously improve system performance, scalability, and resilience.

Automation & Infrastructure-as-Code (IaC) :

- Build and maintain automation pipelines using Terraform, Ansible, Bitbucket, and Jenkins.

- Develop reusable IaC modules for multi-account and multi-environment AWS setups.

- Automate operational processes for provisioning, scaling, monitoring, and recovery.

Observability & Monitoring :

- Define observability standards and create dashboards using Elastic Stack, Grafana, or Prometheus.

- Implement intelligent alerting using AIOps and anomaly detection tools.

- Work with development teams to ensure proper telemetry and trace coverage.

Incident Management & RCA :

- Lead major incident response and ensure quick service restoration.

- Conduct blameless post-incident reviews and implement preventive actions.

- Create and maintain runbooks, escalation matrices, and reliability playbooks.

Performance & Capacity Planning :

- Analyse performance bottlenecks and propose tuning or optimization strategies.

- Lead capacity forecasting and ensure the system can handle growth demands.

Collaboration & Mentorship :

- Partner with development, QA, and DevOps teams to embed SRE principles.

- Coach and mentor junior engineers on reliability engineering and automation.

Documentation & Knowledge Management :

- Maintain detailed architecture diagrams, design documents, and operational procedures.

- Document SLOs, automation workflows, and change management reports.

Technical Leadership :

- Lead technical discussions, reliability reviews, and performance retrospectives.

- Promote a code-driven, automation-first reliability culture across teams.

What You Bring :

Education :

- Bachelors degree in Computer Science, Information Technology, or equivalent experience.

Experience :

- 8+ years in SRE or DevOps roles managing large-scale distributed systems.

- Proven hands-on experience in cloud operations (AWS preferred), automation, and CI/CD pipelines.

- Experience in the Telecom domain is an added advantage.

Technical Skills :

- Deep knowledge of AWS (EKS, EC2, RDS, IAM, VPC, Kafka, CloudWatch, API Gateway, Lambda, WAF, KMS).

- Strong Linux administration and networking fundamentals.

- Skilled in Terraform, Jenkins, Git, and scripting (Python, Bash).

- Solid understanding of observability tools (Grafana, Elastic Stack, Prometheus).

- Experience with container orchestration (Kubernetes) and microservices-based systems.

Certifications (Preferred) :

- AWS Certified DevOps Engineer / Solutions Architect Associate.

- Terraform Associate or Kubernetes Certified Administrator (CKA).

- SRE Foundation or Google SRE certification is desirable.

Why Join Movius? :

- Work on a global-scale platform serving enterprise customers.

- Be part of a high-performing, innovation-driven engineering team.

- Competitive pay, benefits, and opportunities for professional growth.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Shruti

HR Generalist at Movius

Last Active: 30 Dec 2025

Job Views:
37

Applications: 13

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1593980

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers