Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Coredge - Site Reliability Engineer II

COREDGE.IO INDIA PRIVATE LIMITED

Noida

3 - 5 Years

37+ Reviews

Site Reliability Linux Kubernetes Monitoring Tools IT Automation IT Infrastructure Monitoring Cluster Management Linux OS System Administration Prometheus Grafana

Posted on: 09/12/2025

Job Description

Description :

We are seeking a highly skilled and motivated Site Reliability Engineer to join our team.

The ideal candidate will have at least 3 years of DevOps experience and a strong technical background in Linux, Kubernetes, monitoring tools, and automation.

The role involves deploying, monitoring, and managing infrastructure and applications to ensure optimal performance and reliability.

Responsibilities :

- Work with Linux-based systems to deploy and manage applications.

- Troubleshoot Linux-related issues to ensure high availability and performance.

- Maintain system stability, security, and performance tuning.

- Deploy, configure, and maintain Kubernetes clusters.

- Debug issues related to Kubernetes environments, including container orchestration and service failures.

- Ensure seamless containerised application deployments and scaling.

- Implement, configure, and maintain Prometheus and Grafana for system and application monitoring.

- Develop and maintain real-time Grafana dashboards for critical insights.

- Troubleshoot system performance and application issues using monitoring data.

- Understand cloud-based environments and basic cloud computing principles.

- Work with cloud services for infrastructure management and monitoring.

- Assist in troubleshooting cloud-related issues when required.

- Gain an understanding of the Cloud/Horizon portal for managing project-related tasks.

- Monitor and track cloud-based infrastructure using Horizon.

- Utilise the portal for operational insights and incident management.

- Set up, manage, and troubleshoot CronJobs for automating scheduled tasks.

- Ensure automated tasks execute as planned and investigate failures.

- Enhance automation processes to optimise system operations.

Requirements :

- Bachelor's degree in computer science, Information Technology, or a related field (or equivalent experience).

- 3-5 years of experience in a DevOps Support Engineer role.

- Strong expertise in Linux system administration.

- Hands-on experience with Kubernetes deployment, debugging, and troubleshooting.

- Proficiency in Prometheus and Grafana for monitoring and dashboard management.

- Basic knowledge of cloud computing environments.

- Experience with the Horizon portal (preferred but not mandatory).

- Strong scripting and automation skills (Shell, Python, or Ansible is a plus).

- Ability to work independently and handle production incidents with minimal supervision.

- Excellent troubleshooting and analytical skills.

- Certification in Kubernetes (CKA, CKAD) is a plus.

- Experience with CI/CD pipelines and DevOps automation.

- Exposure to cloud providers such as AWS, Azure, and OpenStack.

- Strong understanding of networking fundamentals in a cloud-native environment

Did you find something suspicious?

Posted by

Gunjan

Talent Acquisition at COREDGE.IO INDIA PRIVATE LIMITED

Last Active: 11 Dec 2025

Job Views:
24

Applications: 26

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1587599

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers