HamburgerMenu
hirist

Job Description

Description :


We are looking for a Senior DevOps Engineer to build and operate the cloud-native foundation for a large-scale Agentic AI Platform on ACP (Airtel Sovereign Cloud Platform). This role is for engineers who have operated high-scale production platforms, understand automation-first DevOps, and want to work deep in the AI + data infrastructure layer.

You will enable reliable, secure, and scalable execution of LLMs, AI agents, and data platforms across a sovereign cloud environment.

Key Responsibilities :

- Design, build, and operate CI/CD pipelines for AI, data, and platform services.

- Build and manage Kubernetes-based platforms for scalable agent and model workloads.

- Automate infrastructure provisioning using IaC (Terraform, Helm, etc.).

- Implement observability (logging, metrics, tracing) for AI agents, data pipelines, and platform services.

- Ensure high availability, resilience, and performance of production systems.

- Drive security, isolation, and governance for multi-tenant AI workloads.

- Work closely with data, AI, and platform engineering teams to productionize systems.

- Support release management, incident response, and root cause analysis.

Required Skills & Experience :

- 5- 8 years of experience in DevOps, SRE, or Platform Engineering roles.

- Strong hands-on experience with Kubernetes and container orchestration.

- Proven experience building and operating CI/CD systems at scale.

- Experience with Infrastructure as Code (Terraform, CloudFormation, Pulumi).

- Solid understanding of :

1. Linux systems and networking fundamentals

2. Distributed systems and cloud-native architectures

- Experience supporting high-scale, production-grade platforms.

- Exposure to end-to-end SDLC and production operations.

Good to Have :

- Experience operating AI/ML or data platforms in production.

- Exposure to LLM serving, GPU workloads, or AI runtimes.

- Experience with public cloud platforms (AWS, GCP, Azure).

- Knowledge of service meshes, ingress, and networking at scale.

- Familiarity with security, secrets management, and compliance.

- Open-source contributions or experience running open-source platforms.

What We Offer :

- Ownership of core platform reliability and automation for Agentic AI.

- Opportunity to operate a sovereign, hyperscale AI and data platform.

- Strong focus on automation, reliability, and engineering excellence.

- Work alongside deeply technical data, AI, and platform teams.

- Clear growth path into Staff / Principal Platform or SRE roles.

If you enjoy building platforms that power AI systems at scale and care deeply about reliability and automation, this role is for you.Role & responsibilities


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in