Posted on: 16/12/2025
Key Responsibilities :
- Infrastructure Leadership : Lead the architecture, deployment, and operation of scalable, secure, and highly available cloud-native platforms.
- Kubernetes Expertise : Serve as the subject matter expert for Kubernetes, managing both control plane and data plane components across on-premises and public cloud environments.
- Automation & IaC : Drive Infrastructure as Code (IaC) initiatives using Terraform to manage infrastructure end-to-end, coupled with extensive automation scripting using Python.
- DevOps/SRE : Implement and champion CI/CD pipelines (GitOps methodologies preferred) and robust SRE practices for system reliability, performance, and monitoring.
- Monitoring & Observability : Configure and manage comprehensive monitoring and logging solutions using tools like Prometheus, Grafana, ELK Stack, and Fluentbit.
- Networking & Security : Ensure robust networking, storage, and security configurations across both on-premises and cloud environments, focusing on resilience and compliance.
- Service Mesh & APIs : Deploy and manage service mesh solutions (Istio/Consul) and implement API Gateways (Kong preferred) to manage microservices traffic.
- Event-Driven Systems : Work with Kafka and other technologies to support highly available, event-driven microservices architectures.
- Mentorship : Provide technical guidance and mentorship to junior team members, fostering a culture of operational excellence and continuous improvement.
Required Skills and Experience (Must Have) :
- Total Experience : 10+ years of progressive experience in IT infrastructure, system administration, and cloud engineering.
- Foundational Skills : 10+ years of hands-on experience with Linux/Unix operating systems, major public clouds (AWS, GCP, or Azure), DevOps practices, and Containers.
- Kubernetes Depth : 5+ years of strong, practical experience with Kubernetes, including deep understanding and troubleshooting of the control plane and data plane (on-premise and cloud deployments).
- IaC & Automation : Expert-level proficiency with Terraform for infrastructure provisioning and extensive automation using Python and Shell scripting.
- Networking & Systems : Strong understanding of core networking, storage, and security concepts in complex, distributed environments (on-prem and cloud).
- Cloud-Native Tools : Expertise in implementing and managing Istio / Consul service mesh and packaging applications using Helm charts.
- Observability Stack : Hands-on experience with Prometheus, Grafana, ELK Stack, and Fluentbit for Kubernetes monitoring and logging.
- Microservices Backbone : Experience with API Gateways (Kong preferred), Kafka, and designing event-driven microservices.
- Modern Practices : Exposure to GitOps, CI/CD, and SRE methodologies.
- Communication : Understanding of REST and gRPC communication protocols.
Desirable Skills (Good to Have) :
- Experience managing and troubleshooting distributed / multi-region Kubernetes clusters.
- Familiarity with Tanzu and VMware virtualization technologies.
- Knowledge of container security best practices (image scanning, pod security policies, network policies).
- Advanced concepts in K8s networking/firewall and storage (e.g., CNI, CSI, PV/PVC).
- Proven experience in troubleshooting complex production infrastructure issues under pressure.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1590261
Interview Questions for you
View All