Posted on: 23/01/2026
Description : Senior Platform / Kubernetes Engineer (GPU Workloads).
Location : Vizag, India.
Experience : 5+ Years.
Role Overview :
We are seeking a highly skilled Senior Platform Engineer with deep expertise in Kubernetes and GPU workload orchestration.
The ideal candidate will design, build, and optimize containerized platforms that support high-performance computing, AI/ML pipelines, and GPU-intensive workloads.
You will collaborate with cross-functional teams to ensure scalability, reliability, and efficiency of our cloud-native infrastructure.
Key Responsibilities :
- Architect, deploy, and manage Kubernetes clusters optimized for GPU workloads.
- Design and implement scalable container orchestration solutions for AI/ML and data-intensive applications.
- Automate infrastructure provisioning, monitoring, and scaling using IaC tools (Terraform, Helm, Ansible).
- Collaborate with data scientists and ML engineers to optimize GPU utilization and performance.
- Implement observability solutions (Prometheus, Grafana, ELK, OpenTelemetry) for proactive monitoring.
- Ensure security, compliance, and reliability across Kubernetes clusters and workloads.
- Drive CI/CD pipeline improvements for GPU-enabled applications.
- Troubleshoot complex platform issues and provide root cause analysis.
- Mentor junior engineers and contribute to best practices in cloud-native engineering.
Required Skills & Experience :
- 5+ years of experience in Platform Engineering / DevOps / SRE roles.
- Strong expertise in Kubernetes, including GPU scheduling, operators, and custom controllers.
- Hands-on experience with GPU workloads (NVIDIA CUDA, TensorRT, or similar).
- Proficiency in cloud platforms (AWS, Azure, GCP) and hybrid cloud setups.
- Solid knowledge of containerization (Docker) and orchestration.
- Experience with CI/CD tools (Jenkins, GitLab CI, ArgoCD).
- Strong scripting/programming skills (Python, Go, Bash).
- Familiarity with networking, storage, and security in Kubernetes environments.
- Excellent problem-solving and communication skills.
Preferred Qualifications :
- Experience with Kubeflow, ML pipelines, or distributed training frameworks.
- Knowledge of service mesh technologies (Istio, Linkerd).
- Exposure to HPC environments and large-scale GPU clusters.
- Contributions to open-source Kubernetes/GPU projects.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1605369