HamburgerMenu
hirist

Job Description

Job Title : Senior Infrastructure Test & Validation Engineer.

Key Skills : pytest, Go, k6 scripting, automation frameworks integration.

Job Locations : Bangalore.

Experience : 8-15 years.

Education Qualification : Any Degree Graduation.

Work Mode : Hybird.

Employment Type : Contract.

Notice Period : Immediate 10 Days.

Job description.

Job Description :

Senior Infrastructure Test & Validation Engineer (Zero-Touch GPU Cloud GitOps Validation & Certification).

We are seeking a Senior Infrastructure Test & Validation Engineer with 10+ years of experience to lead the Zero-Touch Validation, Upgrade, and Certification automation of our on-prem GPU cloud platform.

This role focuses on ensuring the stability, performance, and conformance of the entire stackfrom hardware to Kubernetesusing automated, GitOps-based validation pipelines.

The ideal candidate has a strong infrastructure background with deep hands-on skills in Sonobuoy, LitmusChaos, k6, and pytest, and is passionate about automated test orchestration, platform resilience, and continuous conformance.

Key Responsibilities.

- Design and implement automated, GitOps-compliant pipelines for validation and certification of the GPU cloud stack across hardware, OS, Kubernetes, and platform layers.

- Integrate Sonobuoy for Kubernetes conformance and certification testing.

- Design and orchestrate chaos engineering workflows using LitmusChaos to validate system resilience across failure scenarios.

- Implement performance testing suites using k6 and system-level benchmarks, integrated into CI/CD pipelines.

- Develop and maintain end-to-end test frameworks using pytest and/or Go, focusing on cluster lifecycle events, upgrade paths, and GPU workloads.

- Ensure test coverage and validation across multiple dimensions : conformance, performance, fault injection, and post-upgrade validation.

- Build and maintain dashboards and reporting for automated test results, including traceability, drift detection, and compliance tracking.

- Collaborate with infrastructure, SRE, and platform teams to embed testing and validation early in the deployment lifecycle.

- Own quality assurance gates for all automation-driven deployments.

Required Skills & Experience.

- 10+ years of hands-on experience in infrastructure engineering, systems validation, or SRE roles.

- Primary key skills required are pytest, Go, k6 scripting, automation frameworks integration (Sonobuoy, LitmusChaos), CI integration.

- Strong experience with :

o Sonobuoy for Kubernetes conformance and diagnostics.

o LitmusChaos for fault injection and resilience validation.

o k6 for performance/load testing in distributed environments.

o pytest or Go-based test frameworks for automation and validation scripting.

- Deep understanding of Kubernetes architecture, upgrade patterns, and operational risks.

- Experience validating infrastructure components (GPU drivers, kernel modules, CNI, CRI, etc.) across lifecycle events.

- Proficient in GitOps workflows and integrating tests into declarative, Git-backed pipelines (e., with Argo CD, Flux).

- Hands-on experience with CI/CD systems (e., GitHub Actions, GitLab CI, Jenkins) to automate test orchestration.

- Solid scripting and automation experience (Python, Bash, or Go).

- Familiarity with GPU-based infrastructure and its performance characteristics is a strong plus.

- Strong debugging, root cause analysis, and incident investigation skills.


info-icon

Did you find something suspicious?