Posted on: 16/11/2025
The Opportunity : Cloud Automation & Platform Reliability
Details & Specification
Experience : 2 - 7 Years
Location : "Bangalore (Hybrid / Work from Office)"
Mandatory Background : "B.Tech/M.Tech from IIT, NIT, BITS Pilani, or IIIT"
The Company : High-Growth B2B AI SaaS
About the Role
We are seeking an experienced DevOps / Site Reliability Engineer (SRE) to join our Platform team. In our B2B AI SaaS environment, reliability is paramount, and fast deployment is essential for innovation. You will be responsible for building, automating, and maintaining our cloud infrastructure, CI/CD pipelines, and monitoring systems. Your primary goal is to ensure the scalability, security, and exceptional uptime of our enterprise-grade AI platform.
Key Technical Responsibilities :
1. Infrastructure as Code (IaC) & Cloud Engineering :
- Automation : Design, implement, and manage infrastructure entirely through code (Terraform or CloudFormation) across AWS (or GCP/Azure).
- Orchestration : Maintain and scale our container orchestration platform using Kubernetes and Docker, ensuring cost-effective and resilient resource allocation.
- Networking : Manage and optimize cloud networking components, security groups, load balancers, and VPC peering to ensure secure and efficient service communication.
2. CI/CD & Deployment :
- Pipeline Mastery : Own and evolve the continuous integration and continuous deployment (CI/CD) pipelines using tools like Jenkins, GitLab CI, or GitHub Actions, driving a culture of frequent, reliable, and automated releases.
- Observability : Implement and enhance comprehensive monitoring, logging, and alerting systems using the ELK stack, Prometheus, and Grafana to achieve deep visibility into system performance and detect issues proactively.
3. Site Reliability & Security :
- Automation : Reduce toil by automating repetitive operational tasks (e.g., patching, backups, scaling events) through scripting (Python or Bash).
- SLO/SLA : Define and uphold Service Level Objectives (SLOs) and Service Level Agreements (SLAs), participating in on-call rotation to quickly resolve critical production incidents.
- Security : Implement security best practices at the infrastructure level (e.g., identity and access management (IAM), secret management (Vault), network isolation).
What You'll Bring (Mandatory Skills & Experience) :
- Educational Excellence : B.Tech/M.Tech in Computer Science or a related discipline from an IIT, NIT, BITS Pilani, or IIIT is mandatory.
- Experience : 3-6 years of hands-on experience in DevOps, SRE, or Cloud Engineering roles, preferably in a SaaS environment.
- Core Tools : Expert proficiency with Kubernetes, Docker, and IaC (Terraform).
- Cloud : Deep practical experience with a major cloud provider (AWS strongly preferred).
- Scripting : Strong scripting skills in Python or Bash for automation.
- Monitoring : Hands-on experience setting up and maintaining observability tools (Prometheus, Grafana).
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1575524
Interview Questions for you
View All