HamburgerMenu
hirist

Senior DevOps Engineer - IAC Terraform

The reliable jobs
Bangalore
5 - 10 Years

Posted on: 03/02/2026

Job Description

Role Overview :

Lead the DevOps and infrastructure team as both a technical leader and hands-on individual contributor, managing the company's growing cloud and on-premise resources with exceptional reliability and performance. You'll be responsible for maintaining 99% uptime for our high-throughput AdTech platform while optimizing costs and building a world-class infrastructure team.

Key Responsibilities :

- Maintain 99% uptime and meet SLAs across all environments while reducing infrastructure costs by 20-30%

- Design and implement deployment architecture for high-throughput systems (25,000-30,000 QPS, sub-100ms latency)

- Manage multi-cloud infrastructure (AWS, DigitalOcean, GCP) using Infrastructure as Code

- Build CI/CD pipelines, monitoring systems, and automation for distributed microservices

- Troubleshoot production issues including Kafka lag, RabbitMQ failures, Nodejs, Python and Java application performance

- Lead incident response (on-call rotation), post-mortems, and implement preventive measures

- Implement security best practices (OAuth, OIDC, SSO) and disaster recovery protocols

- Build and mentor a team of infrastructure engineers

Required Skills & Experience :

Experience : 5+ years in DevOps/Infrastructure roles, including 2+ years with high-throughput systems (10,000+ QPS)

Infrastructure & Cloud (MUST HAVE) :

- Strong production experience with Infrastructure as Code (Terraform, Terragrunt, Ansible)

- Production Kubernetes and Docker experience with complex microservices architectures

- Multi-cloud expertise : AWS (VPC, EC2, ECS, Fargate, S3, Glacier, RDS, Route 53, CloudFront, Lambda, API Gateway, CloudWatch), DigitalOcean, Azure, or GCP

- Advanced Linux system administration (RHEL, Ubuntu, Amazon Linux) and networking concepts

Data Systems (Added Advantage) :

- ClickHouse : Production operations, query optimization, data retention policies for billions of auction records

- Kafka : Consumer/producer optimization, lag management, performance tuning for high-volume message streams (millions of messages/day)

- RabbitMQ : Message routing, cluster management, troubleshooting connection failures in K8s environments

- MySQL : Database administration, replication, backup/recovery

- Elasticsearch : Bulk indexing optimization, cluster health management

Development & CI/CD :

- CI/CD tools : GitHub Actions, Jenkins, GitLab CI, or similar

- Programming : Python (required), Shell scripting (required); Rust or Go strongly preferred

- JVM troubleshooting : Profiling, GC tuning, memory leak detection, understanding Java Spring Boot applications

- Microservices architectures and API design patterns

- Software development lifecycle and agile methodologies

Monitoring & Observability :

- Prometheus, Grafana, ELK stack (Elasticsearch, Logstash, Kibana, Filebeat)

- System performance troubleshooting under load (CPU bottlenecks, memory leaks, network latency)

- Incident response and production support with systematic debugging approach

- Understanding of RED metrics (Rate, Errors, Duration) and USE metrics (Utilization, Saturation, Errors)

Nice to Have (Strong Bonus) :

AdTech & Domain Knowledge :

- Experience with programmatic advertising and Real-Time Bidding (RTB) systems

- Understanding of ad auction mechanics and sub-100ms latency requirements

- Familiarity with ad fraud prevention and transparency measures

- Knowledge of supply-side platforms (SSP) and demand-side platforms (DSP)

Blockchain & Distributed Systems :

- Blockchain infrastructure and node operations (Sui ecosystem experience is a major bonus)

- Experience with decentralized storage systems (Walrus, IPFS, Arweave)

- Data pipeline integration between blockchain and distributed storage

- Understanding of consensus mechanisms and distributed ledger technology

Advanced Technical Skills :

- Rust or Go programming experience

- MLOps practices and tooling

- Security systems implementation (OAuth 2.0, OIDC, SSO with Okta/Auth0)

- Data lifecycle management and GDPR/privacy compliance awareness

- Experience with high-frequency trading or financial systems

- Start-up or R&D environments with rapid iteration

- Relevant cloud certifications (AWS Certified DevOps Engineer Professional, CKA, CKAD)


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in