Posted on: 13/01/2026
Description :
Position Overview :
We are seeking a highly skilled and motivated Site Reliability Engineer to join our team. The ideal candidate will have at least 3 years of DevOps experience and a strong technical background in Linux, Kubernetes, monitoring tools, and automation.
The role involves deploying, monitoring, and managing infrastructure and applications to ensure optimal performance and reliability.
Key Responsibilities :
Linux Administration :
- Work with Linux-based systems to deploy and manage applications.
- Troubleshoot Linux-related issues to ensure high availability and performance.
- Maintain system stability, security, and performance tuning.
Kubernetes (K8s) Deployment & Debugging (Must Have) :
- Debug issues related to Kubernetes environments, including container orchestration and service failures.
- Ensure seamless containerized application deployments and scaling.
Monitoring & Observability (Must Have) :
- Implement, configure, and maintain Prometheus and Grafana for system and application monitoring.
- Troubleshoot system performance and application issues using monitoring data.
Cloud Knowledge :
- Understand cloud-based environments and basic cloud computing principles.
- Work with cloud services for infrastructure management and monitoring.
- Assist in troubleshooting cloud-related issues when required.
Horizon Portal (Good to Have) :
- Monitor and track cloud-based infrastructure using Horizon.
- Utilize the portal for operational insights and incident management.
CronJobs & Automation :
- Ensure automated tasks execute as planned and investigate failures.
- Enhance automation processes to optimize system operations.
Key Requirements :
- 3-5 years of experience in a DevOps Support Engineer role.
- Strong expertise in Linux system administration.
- Hands-on experience with Kubernetes deployment, debugging, and troubleshooting.
- Proficiency in Prometheus and Grafana for monitoring and dashboard management.
- Basic knowledge of cloud computing environments.
- Experience with Horizon portal (preferred but not mandatory).
- Strong scripting and automation skills (Shell, Python, or Ansible is a plus).
- Ability to work independently and handle production incidents with minimal supervision.
- Excellent troubleshooting and analytical skills.
- Exposure to ARGO, ARGO WORKFLOW is mandatory.
Preferred Qualifications :
- Certification in Kubernetes (CKA, CKAD) is a plus.
- Exposure to cloud providers such as AWS, Azure, and OpenStack.
- Strong understanding of networking fundamentals in a cloud-native environment.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1600777