AI/ML

Artificial Intelligence

Machine Learning

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Equisoft - Senior Site Reliability Support Engineer

Equisoft

Hyderabad

6 - 9 Years

DevOps Site Reliability Incident Management Production Support CI/CD Pipeline IT Automation PowerShell

Posted on: 12/01/2026

Job Description

Description :

Key Responsibilities :

Site Reliability Engineering (SRE) :

- Design, implement, and maintain highly available, scalable, and fault-tolerant systems.

- Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).

- Develop and maintain monitoring, alerting, logging, and observability solutions.

- Automate operational tasks to reduce toil using scripts, tools, and CI/CD pipelines.

- Participate in capacity planning, performance tuning, and system optimization.

- Lead post-incident reviews (RCA) and drive corrective and preventive actions.

- Champion reliability-first design and continuous improvement initiatives.

Production Support & Incident Management :

- Provide L2/L3 production support for business-critical applications and platforms.

- Act as an escalation point during major incidents, outages, and performance degradations.

- Lead incident response, troubleshooting, and recovery in a 24x7 production environment.

- Coordinate with cross-functional teams during incidents to ensure rapid resolution.

- Maintain runbooks, SOPs, and knowledge base documentation.

- Analyze recurring issues and implement long-term fixes to prevent reoccurrence.

DevOps & Cloud Operations :

- Manage and support cloud infrastructure (AWS / Azure / GCP).

- Work with containerization and orchestration platforms such as Docker and Kubernetes.

- Support and enhance CI/CD pipelines for reliable and repeatable deployments.

- Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or ARM.

- Ensure backup, disaster recovery, and business continuity strategies are in place.

Security, Compliance & Governance :

- Implement and enforce security best practices across infrastructure and applications.

- Support vulnerability management, patching, and access control.

- Ensure systems comply with organizational and regulatory standards.

- Participate in audits and compliance-related activities as required.

Required Skills & Qualifications :

Technical Skills :

- Strong experience in Site Reliability Engineering, Production Support, or DevOps roles.

- Proficiency in Linux/Unix system administration.

- Strong scripting skills in Python, Bash, Shell, or PowerShell.

- Hands-on experience with monitoring tools (Prometheus, Grafana, ELK, Splunk, Datadog, New Relic).

- Experience with incident management tools (PagerDuty, Opsgenie, ServiceNow).

- Solid understanding of networking concepts (TCP/IP, DNS, Load Balancers).

- Experience with cloud platforms (AWS / Azure / GCP).

- Familiarity with databases (SQL/NoSQL) and caching systems.

Soft Skills :

- Strong problem-solving and analytical skills.

- Excellent communication and stakeholder management abilities.

- Ability to perform under pressure in high-severity production incidents.

- Mentorship mindset and ability to guide junior engineers.

- Strong ownership and accountability for system reliability

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Asha Gowda

Talent Acquisition at Equisoft

Last Active: 21 Jan 2026

Job Views:
35

Applications: 30

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1600227

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers