Posted on: 13/08/2025
Job Title : Senior Infrastructure Test & Validation Engineer.
Key Skills : pytest, Go, k6 scripting, automation frameworks integration.
Job Locations : Bangalore.
Experience : 8-15 years.
Education Qualification : Any Degree Graduation.
Work Mode : Hybird.
Employment Type : Contract.
Notice Period : Immediate 10 Days.
Job description.
Job Description :
Senior Infrastructure Test & Validation Engineer (Zero-Touch GPU Cloud GitOps Validation & Certification).
We are seeking a Senior Infrastructure Test & Validation Engineer with 10+ years of experience to lead the Zero-Touch Validation, Upgrade, and Certification automation of our on-prem GPU cloud platform.
This role focuses on ensuring the stability, performance, and conformance of the entire stackfrom hardware to Kubernetesusing automated, GitOps-based validation pipelines.
The ideal candidate has a strong infrastructure background with deep hands-on skills in Sonobuoy, LitmusChaos, k6, and pytest, and is passionate about automated test orchestration, platform resilience, and continuous conformance.
Key Responsibilities.
- Design and implement automated, GitOps-compliant pipelines for validation and certification of the GPU cloud stack across hardware, OS, Kubernetes, and platform layers.
- Integrate Sonobuoy for Kubernetes conformance and certification testing.
- Design and orchestrate chaos engineering workflows using LitmusChaos to validate system resilience across failure scenarios.
- Implement performance testing suites using k6 and system-level benchmarks, integrated into CI/CD pipelines.
- Develop and maintain end-to-end test frameworks using pytest and/or Go, focusing on cluster lifecycle events, upgrade paths, and GPU workloads.
- Ensure test coverage and validation across multiple dimensions : conformance, performance, fault injection, and post-upgrade validation.
- Build and maintain dashboards and reporting for automated test results, including traceability, drift detection, and compliance tracking.
- Collaborate with infrastructure, SRE, and platform teams to embed testing and validation early in the deployment lifecycle.
- Own quality assurance gates for all automation-driven deployments.
Required Skills & Experience.
- 10+ years of hands-on experience in infrastructure engineering, systems validation, or SRE roles.
- Primary key skills required are pytest, Go, k6 scripting, automation frameworks integration (Sonobuoy, LitmusChaos), CI integration.
- Strong experience with :
o Sonobuoy for Kubernetes conformance and diagnostics.
o LitmusChaos for fault injection and resilience validation.
o k6 for performance/load testing in distributed environments.
o pytest or Go-based test frameworks for automation and validation scripting.
- Deep understanding of Kubernetes architecture, upgrade patterns, and operational risks.
- Experience validating infrastructure components (GPU drivers, kernel modules, CNI, CRI, etc.) across lifecycle events.
- Proficient in GitOps workflows and integrating tests into declarative, Git-backed pipelines (e., with Argo CD, Flux).
- Hands-on experience with CI/CD systems (e., GitHub Actions, GitLab CI, Jenkins) to automate test orchestration.
- Solid scripting and automation experience (Python, Bash, or Go).
- Familiarity with GPU-based infrastructure and its performance characteristics is a strong plus.
- Strong debugging, root cause analysis, and incident investigation skills.
Did you find something suspicious?
Posted By
Samuel prabu
Talent Acquisition Recruiter at People Prime Worldwide Pvt. Ltd.
Last Active: 25 Sep 2025
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1529718
Interview Questions for you
View All