HamburgerMenu
hirist

Job Description

Job Description :


Key Responsibilities :


- Design, architect, and implement HPC infrastructures to support compute-intensive applications and

workloads.

- Lead the deployment and integration of HPC systems across on-premise, hybrid, and multi-cloud environments.

- Work with cross-functional teams to define and implement HPC strategies, performance tuning, and capacity planning.

- Deploy and manage parallel file systems such as GPFS and Lustre and optimize storage solutions for large scale workloads.

- Implement and maintain cluster management and job scheduling tools like Slurm, Torque, LSF, or PBSPro.

- Evaluate and integrate networking technologies such as Infiniband, RDMA, and high-throughput interconnects for maximum performance.


- Enable and support containerized HPC environments using Docker, Kubernetes, and Singularity.

- Collaborate with stakeholders to understand workloads and recommend architecture changes or new technologies.

- Develop documentation, including architecture designs, operational guides, and system configurations.

- Stay current with emerging HPC technologies, trends, and best practices.


Required Qualifications & Skills :


- Bachelors or Masters degree in Computer Science, Electrical Engineering, or a related field.

- 7+ years of experience designing and managing HPC systems and environments.

- Proven experience in :


1. HPC architecture and cluster design

2. Cloud-based HPC solutions (AWS, Azure, GCP)

3. On-premises and private cloud HPC implementations

- Expertise in parallel computing technologies such as MPI, OpenMP.

- Hands-on experience with high-performance file systems like GPFS, Lustre.

- Familiarity with job schedulers such as Slurm, Torque, PBSPro, or LSF.

- Proficiency in containerization and orchestration tools Docker, Kubernetes, Singularity.

- Strong knowledge of networking protocols and performance tuning TCP/IP, Infiniband, RDMA.

- Experience in at least one programming language: C, C++, Fortran, Python, or Java.

- Exposure to multi-vendor hardware environments and experience working in multi-cloud settings.


Soft Skills :


- Excellent verbal and written communication skills.

- Ability to explain technical concepts to both technical and non-technical stakeholders.

- Strong problem-solving and analytical thinking capabilities.

- Ability to thrive under pressure in a fast-paced and complex environment.

- Team-oriented mindset with leadership potential.


Preferred (Nice to Have) :


- HPC certifications or training in cluster management or performance tuning.

- Experience supporting research, simulation, or AI/ML workloads on HPC clusters.


info-icon

Did you find something suspicious?