Posted on: 23/01/2026
Description :
We are looking for a Senior DevOps Engineer to build and operate the cloud-native foundation for a large-scale Agentic AI Platform on ACP (Airtel Sovereign Cloud Platform). This role is for engineers who have operated high-scale production platforms, understand automation-first DevOps, and want to work deep in the AI + data infrastructure layer.
You will enable reliable, secure, and scalable execution of LLMs, AI agents, and data platforms across a sovereign cloud environment.
Key Responsibilities :
- Design, build, and operate CI/CD pipelines for AI, data, and platform services.
- Build and manage Kubernetes-based platforms for scalable agent and model workloads.
- Automate infrastructure provisioning using IaC (Terraform, Helm, etc.).
- Implement observability (logging, metrics, tracing) for AI agents, data pipelines, and platform services.
- Ensure high availability, resilience, and performance of production systems.
- Drive security, isolation, and governance for multi-tenant AI workloads.
- Work closely with data, AI, and platform engineering teams to productionize systems.
- Support release management, incident response, and root cause analysis.
Required Skills & Experience :
- 5- 8 years of experience in DevOps, SRE, or Platform Engineering roles.
- Strong hands-on experience with Kubernetes and container orchestration.
- Proven experience building and operating CI/CD systems at scale.
- Experience with Infrastructure as Code (Terraform, CloudFormation, Pulumi).
- Solid understanding of :
1. Linux systems and networking fundamentals
2. Distributed systems and cloud-native architectures
- Experience supporting high-scale, production-grade platforms.
- Exposure to end-to-end SDLC and production operations.
Good to Have :
- Experience operating AI/ML or data platforms in production.
- Exposure to LLM serving, GPU workloads, or AI runtimes.
- Experience with public cloud platforms (AWS, GCP, Azure).
- Knowledge of service meshes, ingress, and networking at scale.
- Familiarity with security, secrets management, and compliance.
- Open-source contributions or experience running open-source platforms.
What We Offer :
- Ownership of core platform reliability and automation for Agentic AI.
- Opportunity to operate a sovereign, hyperscale AI and data platform.
- Strong focus on automation, reliability, and engineering excellence.
- Work alongside deeply technical data, AI, and platform teams.
- Clear growth path into Staff / Principal Platform or SRE roles.
If you enjoy building platforms that power AI systems at scale and care deeply about reliability and automation, this role is for you.Role & responsibilities
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
ML / DL Engineering
Job Code
1605594