We are looking for an experienced Kubernetes with strong expertise in Kubernetes clusters, cloud-native technologies, storage integration, and performance optimisation. The ideal candidate should have hands-on experience in designing, deploying, and managing large-scale Kubernetes environments across on-prem and cloud platforms, along with troubleshooting complex containerised workloads.

Key Responsibilities :

Cluster Management & Deployment :

- Provision and manage Kubernetes clusters using kubeadm, RKE2, and Cluster API across cloud platforms (AWS, Azure, GCP, OpenStack).

- Deploy, scale, and upgrade applications using Kubernetes best practices (rolling updates, probes, HPA, VPA).

- Configure node scheduling strategies using taints, tolerations, and affinity rules.

Application Deployment & Troubleshooting :

- Debug CrashLoopBackOff and pod failures using kubectl logs, events, and resource monitoring.

- Troubleshoot networking, persistent volumes, and service exposure issues (ClusterIP, NodePort, LoadBalancer, Ingress).

- Debug application routing using APISIX, NGINX ingress, and multi-path routing.

- Handle application scaling and high-traffic scenarios using autoscalers.

Storage & Data Management :

- Integrate Ceph storage with Kubernetes via CSI drivers for block and filesystem provisioning.

- Troubleshoot PersistentVolume (PV) and PersistentVolumeClaim (PVC) issues.

Observability & Performance :

- Deploy and configure monitoring solutions such as Prometheus and Metrics Server.

- Benchmark cluster and workload performance (CPU, memory, networking).

- Enable log collection and analysis for multi-container pods.

Security & Networking :

- Manage authentication and RBAC policies within Kubernetes.

- Configure isolation for virtual Kubernetes clusters (vcluster).

- Handle registry authentication (AWS ECR, private registries) using image pull secrets.

Specialized Workloads :

- Deploy and manage GPU workloads using NVIDIA GPU Operator.

- Enable GPU scheduling and resource allocation for AI/ML workloads.

Operations & Maintenance :

- Troubleshoot faulty nodes (on-prem / cloud) including CPU, memory, disk, and kubelet health.

- Work on service routing, ingress configurations, and debugging cloud load balancer/firewall issues.

- Perform rolling upgrades and ensure zero-downtime deployments.

Required Skills :

- Strong expertise in Kubernetes administration and cloud-native deployments.

- Hands-on experience with kubeadm, RKE2, Cluster API, and Terraform for cluster provisioning.

- Knowledge of storage integration with Ceph and CSI drivers.

- Experience with monitoring and observability tools (Prometheus, Grafana, Metrics Server).

- Strong debugging skills for pod crashes, networking issues, and persistent storage problems.

- Knowledge of NGINX ingress, APISIX, and traffic routing.

- Understanding of RBAC, security groups, and IAM policies in Kubernetes & cloud.

- Experience with GPU workloads in Kubernetes.

- Familiarity with CI/CD pipelines for Kubernetes deployments is a plus.

Preferred Qualifications :

- 4+ years of hands-on experience in Kubernetes roles.

- Experience in both managed (EKS, AKS, GKE) and on-prem Kubernetes clusters.

- Strong scripting skills (Bash, Python, Go preferred).

- Prior experience with infrastructure-as-code tools like Terraform, Helm, and Ansible.

- Exposure to multi-cluster and multi-tenant environments.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Aparna Mohan

Senior Exwcutive at MulticoreWare Inc

Last Active: 12 Dec 2025

Job Views:
74

Applications: 30

Recruiter Actions: 26

Posted in

DevOps / SRE

Functional Area

DevOps / Cloud

Job Code

1584171

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers