Posted on: 27/11/2025
Description :
- Architect, design, and operate large-scale Kubernetes clusters (EKS/GKE/AKS) for microservices and data workloads.
- Define cluster topology, autoscaling strategies (HPA/VPA/Karpenter), node groups, and networking models.
- Build and maintain service mesh frameworks (Istio/Linkerd), ingress controllers, and API gateways.
- Develop internal platform tooling for deployments, traffic management, rollback, and cluster governance.
- Own cluster security: pod security standards, network policies, admission controllers, RBAC.
Infrastructure Architecture & Automation :
- Design end-to-end cloud infrastructure architectures ensuring security, reliability, and cost optimization.
- Implement and enforce Infrastructure-as-Code using Terraform/Pulumi at org-wide scale.
- Define patterns for multi-account architecture, VPC design, load balancing, secret management, and zero-trust networking.
- Lead cloud cost optimization initiatives and implement FinOps practices.
CI/CD Platform & Deployment Engineering :
- Architect highly reliable CI/CD workflows with GitHub Actions / GitLab CI / Jenkins / ArgoCD.
- Build automated release pipelines for microservices, operators, and stateful workloads.
- Set standards for blue/green, canary, shadow deploy, and progressive rollouts.
- Build reusable deployment templates and developer self-service mechanisms.
Reliability, Observability & Operations :
- Own organization-wide monitoring, logging, and alerting platforms (Prometheus, Loki, Grafana, New Relic).
- Define SLOs/SLIs, reliability targets, and implement automated remediation workflows.
- Lead incident response for high-severity issues and drive long-term fixes through RCAs.
- Build platform-wide health dashboards, cluster insights, cost dashboards, and performance metrics.
Security & Compliance :
- Implement Kubernetes and cloud-native security best practices: IAM hardening, OPA/Gatekeeper policies, secret lifecycle, container scanning, and runtime security.
- Automate compliance validations and enforce organization-wide DevSecOps policies.
- Partner with security teams for penetration tests, threat modeling, and vulnerability management.
Technical Leadership & Collaboration :
- Mentor engineers and guide teams on DevOps, Kubernetes, and cloud architecture.
- Lead technical design reviews, platform roadmap discussions, and cross-team initiatives.
- Influence engineering decisions by providing architectural recommendations and operational insights.
What Youll Bring :
Core Technical Strengths :
- 612 years of DevOps/SRE/Platform experience with deep Kubernetes expertise.
- Hands-on experience designing, operating, and scaling production-grade Kubernetes clusters.
- Expert-level understanding of :
- Load balancing, autoscaling, resilience patterns.
- Monitoring stack setup and automation.
Automation & Tooling :
- Strong proficiency in IaC (Terraform, Pulumi) and GitOps (ArgoCD/FluxCD).
- Programming/scripting proficiency in Python, Go, or Bash.
- Experience building internal dev platforms, tools, and automation frameworks.
Soft Skills :
- Ability to drive architecture discussions and make high-impact technical decisions.
- Leadership mindset with strong ownership and ability to influence without authority.
- Strong communication and stakeholder management across engineering, security, and product teams.
Preferred Qualifications :
- Experience with multi-cluster, multi-region, or hybrid-cloud Kubernetes setups.
- Prior work with distributed systems, high-scale environments, or data platforms.
- Exposure to eBPF, Cilium, or advanced CNI plugins.
- Certifications such as CKA/CKAD/CKS or AWS Solutions Architect.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1581720
Interview Questions for you
View All