Posted on: 17/09/2025
Role : HPC System Engineer
Full Time
Location : Hyderabad (REMOTE)
Notice Period : 30 Days
Job Description :
Responsibilities :
- User Account management for HPC onboarding and offboarding
- Creation and Maintenance of AMI Images in AMI accounts
- Install, configure, and maintain Linux operating systems on HPC clusters
- Support HPC necessary components and native services of the platform by coordinating with respective providers e.g., EFPortal, AWS RES, CycleCloud, AWS Parallel Cluster, etc.
- AWS Managed Active Directory support and Management
- Continuous upgrades to the HPC platform and related components OS, Java, Python, EFPortal, etc.
- Implement and maintain necessary compliance controls i.e., US Export Control, Confidentiality
- Conduct regular audits, share the findings and implement corrective actions as required
- Co-ordinate with other teams like v-drive team in testing and migrating/installing engineering applications to the platform
- Manage job schedulers such as Slurm or LSF
- Utilize node provisioning tools like Werewolf
- Troubleshoot system issues and provide technical support to users
- Monitor system performance and ensure optimal operation of the HPC environment
- Collaborate with other IT professionals to integrate new technologies into the existing infrastructure
- Progressive experience in HPC system administration, preferably in a Redhat/CentOS Linux environment
- AWS Cloud formation templates to build infrastructure for HPC and storage Amazon FSx for Netapp and Lustre
- Experience with parallel file systems and storage solutions
- Strong knowledge of job schedulers such as Slurm or LSF
- Familiarity with node provisioning tools like Werewolf
- Proficiency in Linux OS administration
- Knowledge of job scheduling tools (e.g., Slurm)
- Understanding of node provisioning tools (e.g., Werewolf)
- Excellent problem-solving abilities
- Linux+ certification preferred
- Top Secret Clearance : TS/SCI preferred
- On-site presence at customer location in Stennis, MS
- Availability for some on-call/weekend work
- Hands on experience setting up HPC compute cluster
- Setup PBS job scheduler and supporting PBS servers
- Experience with Redhat and Rocky Linux; bash scripting
- Nice to have Docker, Kubernetes experience
- Nice to have Storage knowledge
- Nice to have networking and devops knowledge
Qualifications :
Minimum Qualifications / Skills :
- Preferably in Computer Science, Information Systems, or related field
Preferred qualifications / Skills :
- In-depth requirement understanding skills with good analytical and problem solving ability, interpersonal efficiency, and positive attitude
Must have Experienced :
- Deployed and configured AWS Parallel Cluster for HPC workload orchestration with CFT
- Deployed and configured AWS Managed active directory
- Provisioned Amazon Storage FSx for NetApp and Lustre with HPC
Did you find something suspicious?