HamburgerMenu
hirist

Job Description

Description : Senior Platform / Kubernetes Engineer (GPU Workloads).

Location : Vizag, India.

Experience : 5+ Years.

Role Overview :


We are seeking a highly skilled Senior Platform Engineer with deep expertise in Kubernetes and GPU workload orchestration.

The ideal candidate will design, build, and optimize containerized platforms that support high-performance computing, AI/ML pipelines, and GPU-intensive workloads.

You will collaborate with cross-functional teams to ensure scalability, reliability, and efficiency of our cloud-native infrastructure.

Key Responsibilities :


- Architect, deploy, and manage Kubernetes clusters optimized for GPU workloads.

- Design and implement scalable container orchestration solutions for AI/ML and data-intensive applications.

- Automate infrastructure provisioning, monitoring, and scaling using IaC tools (Terraform, Helm, Ansible).

- Collaborate with data scientists and ML engineers to optimize GPU utilization and performance.

- Implement observability solutions (Prometheus, Grafana, ELK, OpenTelemetry) for proactive monitoring.

- Ensure security, compliance, and reliability across Kubernetes clusters and workloads.

- Drive CI/CD pipeline improvements for GPU-enabled applications.

- Troubleshoot complex platform issues and provide root cause analysis.

- Mentor junior engineers and contribute to best practices in cloud-native engineering.

Required Skills & Experience :

- 5+ years of experience in Platform Engineering / DevOps / SRE roles.

- Strong expertise in Kubernetes, including GPU scheduling, operators, and custom controllers.

- Hands-on experience with GPU workloads (NVIDIA CUDA, TensorRT, or similar).

- Proficiency in cloud platforms (AWS, Azure, GCP) and hybrid cloud setups.

- Solid knowledge of containerization (Docker) and orchestration.

- Experience with CI/CD tools (Jenkins, GitLab CI, ArgoCD).

- Strong scripting/programming skills (Python, Go, Bash).

- Familiarity with networking, storage, and security in Kubernetes environments.

- Excellent problem-solving and communication skills.

Preferred Qualifications :

- Experience with Kubeflow, ML pipelines, or distributed training frameworks.

- Knowledge of service mesh technologies (Istio, Linkerd).

- Exposure to HPC environments and large-scale GPU clusters.

- Contributions to open-source Kubernetes/GPU projects.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in