Artificial Intelligence

Machine Learning

NLP

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Site Reliability Engineer - Docker/Kubernetes

SproutsAI

Bangalore

4 - 8 Years

Site Reliability CI/CD Pipeline Linux System Administration Docker Kubernetes Monitoring Tools CI/CD Tools Ansible Configuration Management Tools Prometheus

Posted on: 30/07/2025

Job Description

Job Description :

Responsibilities :

- Ensure the reliability, performance, and availability of Integral's applications through proactive monitoring and automation.

- Develop and maintain real-time monitoring, alerting, and logging systems to detect and resolve issues before they impact customers.

- Automate manual operations, including application deployment, configuration, scaling, and recovery.

- Collaborate with software engineering teams to integrate reliability best practices into the development lifecycle.

- Conduct root cause analysis (RCA) and implement preventive measures to mitigate recurring issues.

- Support a 24/7 distributed enterprise environment across multiple global data centers.

- Work closely with the Support team to enhance incident response processes, ensuring fast and effective resolution of technical escalations.

- Participate in on-call rotations to support critical application issues and outages.

- Maintain and optimize CI/CD pipelines to ensure fast and reliable application releases.

- Enhance system security by managing SSL certificates, encryption, and authentication mechanisms.

- Foster a culture of continuous improvement by evaluating new tools, frameworks, and methodologies to enhance system reliability.

Requirements :

- Bachelors degree in Computer Science, Engineering, or a related field, or equivalent experience.

- 4+ years of experience in a similar role focusing on application reliability, automation, and performance optimization.

- Strong expertise in Linux and Windows system administration.

- Proficiency in at least one scripting language (e.g., Python, Shell, Perl, JavaScript).

- Experience with Docker, Kubernetes, or containerization technologies.

- Familiarity with CI/CD tools like Jenkins and deployment automation frameworks.

- Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack, New Relic, Datadog).

- Understanding of networking concepts (TCP, IP, DNS, load balancing, firewalls).

- Experience with configuration management tools like Ansible, Salt, or Puppet.

- Strong debugging and troubleshooting skills across application, database, and infrastructure layers.

- Ability to work in a fast-paced, high-pressure environment with multiple priorities.

- Excellent communication and collaboration skills to work effectively with engineering and support teams.

Did you find something suspicious?

Posted By

Venkatesh

Talent Acquisition Specialist at SproutsAI

Last Active: 31 Oct 2025

Job Views:
55

Applications: 41

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1521999

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers