HamburgerMenu
hirist

unifyCX - Site Reliability Engineer - Network Operations Center

Posted on: 24/07/2025

Job Description

Job Summary :

We have an opening for a Site Reliability Engineer to join our client's team. You will be responsible for maintaining the reliability and uptime of critical services, with a focus on Kubernetes administration, CentOS servers, Java application support, incident management, and change management.

The ideal candidate will possess strong ArgoCD experience for Kubernetes management, Linux skills, basic scripting knowledge, and familiarity with modern monitoring, alerting, and automation tools. We are looking for someone who is self-motivated, possesses excellent communication skills (both oral and written), and can work both independently and collaboratively.


Responsibilities :


- Monitor, maintain, and manage applications on CentOS servers, ensuring high availability and performance

- Conduct routine tasks for system and application maintenance. Follow SOP's to correct/prevent issues

- Respond to and manage running incidents, including running post-mortem meetings, performing root cause analysis, and ensuring timely resolution

- Monitor production systems, applications, and overall performance

- Using tools to detect abnormal behaviors in the software and, more importantly, collect information that helps developers understand what causes the problem

- Conduct security checks

- Run meetings with our business partners following in-place processes and procedures

- Writing, updating and maintaining policy and procedure documents

- Write scripts or code as necessary to develop tools and/or services in order to support the product

- Learn from Post Mortems and prevent new incidents from occurring

- Performing admin work on various tools and applications such as JIRA and New Relic

- Maintain Service-level objectives, specific and quantifiable goals related to maintaining the parameters set for our Golden Metrics Technical Skills:

- 5+ years of experience working in a SaaS and Cloud environment

- Administration of Kubernetes clusters, including management of applications using ArgoCD

- Linux scripting to automate routine tasks and improve operational efficiency is required

- Experience with database systems like MySQL and DB2 is required to be successful in this role

- Experience as a Linux (CentOS / RHEL) administrator is a must

- Understanding of running change management procedures, experience running change management meetings, and enforcing safe and compliant changes to production environments

- Deep knowledge of on-call responsibilities and awareness of time management. Include maintaining On-call management tools such as xMatters software

- Experience with managing deployments using Jenkins

- Prior experience with monitoring tools including New Relic, Splunk and Nagios

- Experience with log aggregation tools like Splunk, Loki or Grafana

- Strong scripting knowledge on any one of Language - Python/Ruby/Bash/Java/GoLang

- Experience with API programming and integrating tools such as Jira, Slack, xMatters/PagerDuty.


info-icon

Did you find something suspicious?