Posted on: 16/07/2025
Position Title : Technical Lead - HPC
Location : Chennai
Experience Required : 7 to 14 Years
Notice Period : Immediate to 60 days
Role Overview :
We are seeking a highly experienced High Performance Computing (HPC) Admin / Cloud Engineer to lead the design, implementation, and support of HPC clusters. This role is ideal for candidates with deep expertise in Linux systems, server architecture, storage infrastructure, cloud-based HPC environments, and DevOps practices.
You will be responsible for deploying robust, scalable HPC solutions, managing end-to-end system configurations, and ensuring optimal performance for enterprise-level computing environments. A strong background in system architecture, automation, and operational excellence is essential.
Key Responsibilities :
- Design, build, and support high-performance compute (HPC) clusters including CPU/GPU-based architectures.
- Architect robust and scalable storage systems for HPC environments.
- Generate detailed hardware BOMs, manage vendors, and oversee hardware release processes.
- Install and configure Linux-based operating systems on HPC infrastructure.
- Translate system and subsystem-level performance requirements into technical specifications.
- Ensure timely execution and delivery of HPC projects in line with defined milestones.
- Deliver production-ready solutions including golden images, procedures, scripts, and documentation.
- Support manufacturing and operations teams with system deployment and troubleshooting.
Must-Have Skills & Experience :
- 7+ years of hands-on experience with HPC environments and Linux system administration.
- In-depth knowledge of Linux distributions (SuSE, RedHat, Rocky, Ubuntu).
- Strong understanding of server hardware, GPUs, BIOS, BMC, and high-bandwidth interconnects.
- Expertise in system tools like Systemd, Netboot/PXE, and Linux High Availability (HA).
- Solid understanding of TCP/IP networking fundamentals and common protocols (DNS, DHCP, HTTP, LDAP, SMTP).
- Proficient in Shell scripting and Python for system automation.
- Experience with configuration management tools such as SaltStack, Puppet, or Chef.
- Bachelor's or Master's degree (BE/BTech/MCA/MSc) in Computer Engineering, Electrical Engineering, or a related technical field.
- No gaps in employment history and stable job tenure (minimum 2 years per role).
Preferred Skills (Nice to Have) :
- Experience in DevOps pipelines using Jenkins, Git-based repositories, and containerization tools like Docker and Singularity.
- Familiarity with Kubernetes, Prometheus, Grafana for monitoring and orchestration.
- Knowledge of Apache/Nginx, reverse proxy configurations, and load balancing (e.g., HAProxy).
- Experience working with infrastructure as code tools like Terraform.
Soft Skills :
- Strong collaboration and interpersonal skills with the ability to work across various levels of an organization.
- Effective time management and organizational abilities to meet deadlines in a high-performance environment.
- Excellent written and verbal communication skills.
- High adaptability and problem-solving mindset in rapidly evolving technical environments.
Education Requirements :
- Mandatory : BE/BTech, MCA, MSc in Computer Science, Electrical Engineering, or a related field.
- Note : Candidates with Diplomas or only 3-year degrees (e.g., BCA, BSc) are not eligible.
Additional Guidelines :
- Candidates with only DevOps experience (without HPC experience) may not be a fit.
- Candidates must not currently be employed with certain restricted companies (e.g., no poach policy).
- Looking for long-term team players, not frequent job switchers.
Interview Process :
- 3 Rounds of Technical Interviews
- 1 HR Discussion
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1513725
Interview Questions for you
View All