Posted on: 18/02/2026
Description :
- The solution is deployed in a secure on-premise high-performance computing (HPC) environment and is accessed by multiple internal teams over LAN and intranet. The role also offers exposure to future cloud and hybrid deployments.
Key Responsibilities :
- Deploy, configure, and manage applications on on-premise HPC infrastructure
- Build and maintain CI/CD pipelines for on-prem environments
- Containerize applications using Docker and support orchestration using Kubernetes
- Manage LAN/intranet-based networking, internal routing, and service access
- Support ML/DL and RAG pipelines from an infrastructure and deployment perspective
- Implement and maintain RBAC, access controls, and security policies
- Monitor system health, performance, and resource utilization
- Automate infrastructure and operational workflows using scripts and configuration tools
- Troubleshoot deployment, networking, and performance issues
- Maintain clear documentation for deployment processes and operational SOPs
- Collaborate with engineering teams to plan future cloud and hybrid deployments
Required Skills & Qualifications :
- 1 to 4 years of hands-on experience in DevOps, Platform Engineering, or Infrastructure roles
- Strong experience with Linux system administration
- Experience with on-premise deployments (bare metal or virtualized environments)
- Working knowledge of Docker and containerized workloads
- Familiarity with Kubernetes fundamentals
- Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, etc.)
- Good understanding of networking fundamentals (LAN, intranet, DNS, ports, firewalls)
- Scripting skills in Bash/Shell and/or Python
- Knowledge of RBAC, secrets management, and security best practices
Good to Have :
- Exposure to HPC environments (GPU clusters, schedulers like Slurm/PBS)
- Experience supporting AI/ML workloads (training, inference, pipelines)
- Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK stack)
- Knowledge of Infrastructure as Code tools (Ansible, Terraform)
- Basic exposure to cloud platforms (AWS, Azure)
- Experience working in restricted or enterprise-grade environments
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1613864