Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

HPC Enterprise Architect

Hector and Streak Consulting Pvt Ltd

Bangalore

7 - 8 Years

4.8

9+ Reviews

HPC Enterprise Architect IT Infrastructure Performance Tuning Cluster Management RDMA Cloud AWS Docker Kubernetes

Posted on: 10/07/2025

Job Description

Job Description :

Key Responsibilities :

- Design, architect, and implement HPC infrastructures to support compute-intensive applications and

workloads.

- Lead the deployment and integration of HPC systems across on-premise, hybrid, and multi-cloud environments.

- Work with cross-functional teams to define and implement HPC strategies, performance tuning, and capacity planning.

- Deploy and manage parallel file systems such as GPFS and Lustre and optimize storage solutions for large scale workloads.

- Implement and maintain cluster management and job scheduling tools like Slurm, Torque, LSF, or PBSPro.

- Evaluate and integrate networking technologies such as Infiniband, RDMA, and high-throughput interconnects for maximum performance.

- Enable and support containerized HPC environments using Docker, Kubernetes, and Singularity.

- Collaborate with stakeholders to understand workloads and recommend architecture changes or new technologies.

- Develop documentation, including architecture designs, operational guides, and system configurations.

- Stay current with emerging HPC technologies, trends, and best practices.

Required Qualifications & Skills :

- Bachelors or Masters degree in Computer Science, Electrical Engineering, or a related field.

- 7+ years of experience designing and managing HPC systems and environments.

- Proven experience in :

1. HPC architecture and cluster design

2. Cloud-based HPC solutions (AWS, Azure, GCP)

3. On-premises and private cloud HPC implementations

- Expertise in parallel computing technologies such as MPI, OpenMP.

- Hands-on experience with high-performance file systems like GPFS, Lustre.

- Familiarity with job schedulers such as Slurm, Torque, PBSPro, or LSF.

- Proficiency in containerization and orchestration tools Docker, Kubernetes, Singularity.

- Strong knowledge of networking protocols and performance tuning TCP/IP, Infiniband, RDMA.

- Experience in at least one programming language: C, C++, Fortran, Python, or Java.

- Exposure to multi-vendor hardware environments and experience working in multi-cloud settings.

Soft Skills :

- Excellent verbal and written communication skills.

- Ability to explain technical concepts to both technical and non-technical stakeholders.

- Strong problem-solving and analytical thinking capabilities.

- Ability to thrive under pressure in a fast-paced and complex environment.

- Team-oriented mindset with leadership potential.

Preferred (Nice to Have) :

- HPC certifications or training in cluster management or performance tuning.

- Experience supporting research, simulation, or AI/ML workloads on HPC clusters.

Did you find something suspicious?

Posted By

sankar

at Hector and Streak Consulting Pvt Ltd

Last Active: 29 Nov 2025

Job Views:
36

Applications: 3

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

IT Infrastructure Services

Job Code

1509960

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers