HamburgerMenu
hirist

Job Description

About the Role :



You will lead the design and implementation of scalable, secure, and highly available infrastructure across both cloud and on-premise environments.

This role demands a deep understanding of Linux systems, infrastructure automation, and performance tuning, especially in high-performance computing (HPC) setups.

As a technical leader, youll collaborate closely with development, QA, and operations teams to drive DevOps best practices, tool adoption, and overall infrastructure reliability.

Key Responsibilities :


- Design, build, and maintain Linux-based infrastructure across cloud (primarily AWS) and physical data centers.


- Implement and manage Infrastructure as Code (IaC) using tools such as CloudFormation, Terraform, Ansible, and Chef.


- Develop and manage CI/CD pipelines using Jenkins, Git, and Gerrit to support continuous delivery.


- Automate provisioning, configuration, and software deployments with Bash, Python, Ansible, etc.


- Set up and manage monitoring/logging systems like Prometheus, Grafana, and ELK stack.


- Optimize system performance and troubleshoot critical infrastructure issues related to networking, filesystems, and services.


- Configure and maintain storage and filesystems including ext4, xfs, LVM, NFS, iSCSI, and potentially Lustre.


- Manage PXE boot infrastructure using Cobbler/Kickstart, and create/maintain custom ISO images.


- Implement infrastructure security best practices, including IAM, encryption, and firewall policies.


- Act as a DevOps thought leader, mentor junior engineers, and recommend tooling and process improvements.


- Maintain clear and concise documentation of systems, processes, and best practices.


- Collaborate with cross-functional teams to ensure reliable and scalable application delivery.

Required Skills & Experience :


- 7+ years of experience in DevOps, SRE, or Infrastructure Engineering.


- Deep expertise in Linux system administration, especially around storage, networking, and process control.


- Strong proficiency in scripting (e.g., Bash, Python) and configuration management tools (Chef, Ansible).


- Proven experience in managing on-premise data center infrastructure, including provisioning and PXE boot tools.

- Familiar with CI/CD systems, Agile workflows, and Git-based source control (Gerrit/GitHub).



- Experience with cloud services, preferably AWS, and hybrid cloud models.


- Knowledge of virtualization (e.g., KVM, Vagrant) and containerization (Docker, Podman, Kubernetes).


- Excellent communication, collaboration, and documentation skills.

Nice to Have :


- Hands-on with Lustre or other distributed/parallel filesystems.


- Experience in HPC (High-Performance Computing) environments.


- Familiarity with Kubernetes deployments in hybrid clusters.


info-icon

Did you find something suspicious?