Ideal Candidate :

- Mandatory (Experience 1) : Must have 5+ years in DevOps / SRE / Infrastructure roles with hands-on experience (clear scale signals like traffic, uptime, latency, infra size should be mentioned)

- Mandatory (Experience 2) : Must have B2B SaaS company experience with multi-tenant architecture OR multiple production stacks (multi-env / multi-client systems)

- Mandatory (Tech Skills 1 - Cloud & Infra) : AWS (VPC, EKS, EC2, RDS, networking), Kubernetes (EKS) at scale, Designing high availability, multi-region systems

- Mandatory (Tech Skills 2 - Automation & IaC) : Terraform (must-have), Helm / GitOps, Strong scripting (Python / Go / Bash)

- Mandatory (Tech Skills 3 - CI/CD & Release) : Scalable CI/CD pipelines (GitHub Actions / Jenkins), Zero/low downtime deployments

- Mandatory (Tech Skills 4 - Reliability & Observability) : SRE principles (SLOs, SLIs, error budgets), Monitoring tools (Prometheus, Grafana, Datadog), Alerting, on-call, incident management

- Mandatory (Education) : BTech in Computer Science or related fields

- Mandatory (Company) : Strong B2B SaaS product companies only (good scaled)

Role & Responsibilities :

- You take end-to-end ownership of infrastructure, design, scale, and operate it. This goes beyond execution. Here's what that looks like day to day :

- Own the design, architecture, and reliability of Locus's cloud infrastructure across AWS, Azure, GCP, and Aliyun, supporting multi-region, global deployments.

- Lead the evolution of our CI/CD ecosystem, optimize and refactor our Jenkins-as-Code setup for scalability, performance, and developer efficiency.

- Drive the Infrastructure as Code (IaC) journey end-to-end, migrate existing cloud resources, alarms, and configurations fully into code with strong versioning, review, and rollback practices.

- Partner with engineering teams to identify and resolve performance, scalability, and reliability bottlenecks, deep dives into memory, CPU, networking, and storage constraints.

- Define and implement monitoring, alerting, and incident response best practices, improve MTTR, system observability, and operational readiness.

- Lead initiatives around cost optimization, security hardening, and capacity planning, keep infrastructure efficient and compliant as the platform scales.

- Act as a technical mentor for junior DevOps engineers and raise the overall DevOps maturity across teams.