HamburgerMenu
hirist

Celsior Technologies - Site Reliability Engineer - Cloud Native Apps

Posted on: 14/07/2025

Job Description

Site Reliability Engineer (SRE)

We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join our team, focused on supporting and scaling AI/ML platforms and cloud-native applications.

This role is essential for ensuring the stability, availability, and performance of critical systems and services, while driving automation, observability, and operational excellence across our cloud infrastructure.

Technical Skills :

- Programming : Proficiency in languages like Python, Bash, or Java is essential.

- Operating Systems : Deep understanding of Linux/Windows operating systems and networking concepts.

- Cloud Technologies : Experience with AWS & Azure including services, architecture, and best practices.

- Containerization and Orchestration : Hands-on experience with Docker, Kubernetes, and related tools.

- Infrastructure as Code (IaC) : Familiarity with tools like Terraform, CloudFormation or Azure CLI.

- Monitoring and Observability : Experience with tools like Splunk, New Relic or Azure Monitoring.

- CI/CD : Experience with continuous integration and continuous delivery pipelines, GitHub, GitHub Actions.

- Knowledge in supporting Azure ML, Databricks and other related SAAS tools.

Soft Skills :

- Problem-Solving : Ability to troubleshoot and debug complex distributed systems independently.

- Communication : Strong written and verbal communication skills to collaborate with development and operations teams, and able to write documentation like Runbook etc.

Specific Experience :

- Incident Management : Experience with incident response, root cause analysis, and post-incident reviews.

- Scalability and Performance : Understanding of scalability, availability, and performance monitoring for large-scale systems.

- Automation : Experience in automating repetitive tasks and workflows.

Preferred Qualifications :

- Experience with specific cloud platforms (AWS, Azure).

- Certifications related to cloud engineering or DevOps.

- Experience with microservices architecture including supporting AI/ML solutions.

- Experience with large-scale system management and configuration.


info-icon

Did you find something suspicious?