Description :

Role : Site Reliability Engineer (SRE)

Location : Bengaluru

Company : Spectro Cloud

About the Role :

At Spectro Cloud, we are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing SRE team and play a critical role in shaping the future of our industry-leading Palette platform.

As an SRE at Spectro Cloud, you will operate at the intersection of software engineering and operations, owning the reliability, scalability, and operational excellence of our runtime platforms. You will design and build automation-driven solutions that enable self-service infrastructure, improve system resilience, and ensure a world-class customer experience.

This role is ideal for engineers who thrive in ambiguity, enjoy solving complex problems incrementally, and are energized by building scalable systems in a fast-moving, collaborative environment.

Key Responsibilities :

Technical Leadership & Culture :

- Foster a culture that values technical excellence, accountability, collaboration, and empathy.

- Lead by example in operational rigor, engineering best practices, and continuous improvement.

Automation & Self-Service Enablement :

- Design and implement automation, tools, and workflows that enable self-service provisioning and accelerate engineering velocity.

- Build scalable systems for environment configuration, container orchestration, and infrastructure lifecycle management.

Platform & Infrastructure Engineering :

- Contribute to the design, configuration, and deployment of platform, networking, and cloud infrastructure.

- Actively manage and prioritize critical Kubernetes infrastructure initiatives.

Reliability, Resilience & Observability :

- Enhance and automate failover capabilities and resiliency against fault conditions.

- Implement comprehensive testing, logging, monitoring, and alerting using ELK and related observability tooling.

- Ensure all systems are auditable, with automated mechanisms to supply compliance evidence.

Service Reliability Management :

- Define, implement, and continuously refine Service Level Indicators (SLIs) and Service Level Objectives (SLOs) aligned with customer impact and business priorities.

- Act as Incident Commander for high-severity incidents, driving clear decision-making and effective cross-functional coordination.

Risk & Operations Management :

- Proactively identify, assess, and mitigate operational risks.

- Balance reliability improvements with delivery speed and business objectives.

What Success Looks Like :

You will excel in this role if you :

- Thrive in environments with evolving requirements.

- Break down complex challenges into iterative, measurable improvements.

- Embrace a test-and-learn mindset and continuously refine solutions based on outcomes.

- Demonstrate strong ownership, independence, and a bias toward action.

- Collaborate effectively across distributed teams and time zones.

Qualifications :

We recognize that no candidate meets every requirement. The following qualifications help guide our assessment:

- 5+ years of experience delivering SRE-focused projects involving automation, systems administration, and operational excellence.

- Bachelors degree in Computer Science or a related field (or equivalent practical experience).

- Strong understanding of cloud security best practices and compliance standards.

- Hands-on experience with Infrastructure as Code (IaC) tools such as Terraform.

- Advanced experience with AWS, EKS, Helm, and Git (Github).

- Proven ability to define and implement AWS architectural best practices, including security, performance, reliability, and cost optimization.

- Familiarity with SRE principles; relevant certifications (e.g., SRE Foundation) are a plus.

- Excellent written and verbal communication skills.

- Ability to manage multiple initiatives and respond effectively to escalations.

- Experience working with stakeholders across all levels, including executive leadership.

- Comfort collaborating with remote teams across the U.S. and internationally.

The Hiring Process :

At Spectro Cloud, we value your time and aim to keep our hiring process focused and meaningful.

Our engineering interview process typically includes three to four stages :

- Initial screening interview

- Two to three technical interviews, including hands-on assessments

- Final round focused on team fit and deeper discussions

Most interviews are conducted via Google Meet. We recommend joining from a laptop with a stable internet connection and a working camera for the best experience.

Why Spectro Cloud

Join a team of passionate engineers, innovators, and problem-solversSpectronautswho are redefining how enterprises manage Kubernetes at scale.

Become a Spectronaut and help shape the future of cloud-native infrastructure.