We are seeking a skilled and proactive Site Reliability Engineer (SRE) with a strong DevOps mindset and hands-on experience in application troubleshooting. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our applications and infrastructure. This role requires a blend of software engineering, system administration, and operational expertise, with a focus on automating processes and proactively resolving issues.

Key Responsibilities :

Site Reliability & Automation :

- Design and implement tools to automate infrastructure provisioning, application deployment, and operational tasks.

- Build and manage CI/CD pipelines using Jenkins to ensure seamless and efficient software delivery.

- Utilize a strong understanding of Linux to maintain and troubleshoot server environments, including certificate renewals.

Monitoring & Troubleshooting :

- Implement and manage monitoring solutions using tools like Splunk or Dynatrace to create dashboards, set up alerting, and execute log queries for proactive issue detection.

- Perform application troubleshooting, debugging, and root cause analysis to resolve complex incidents promptly.

- Leverage SQL (DML & SELECT queries) to analyze application data for performance and troubleshooting insights.

Process & Collaboration :

- Apply ITIL/ITSM principles for effective incident, problem, and change management.

- Collaborate closely with development, quality assurance, and product teams to improve system reliability.

- Manage and track code changes using Git or Bitbucket.

Required Skills :

Core Technical Skills :

- 5-8 years of experience in an SRE, DevOps, or similar role.

- Strong proficiency in at least one scripting language : Shell, Groovy, or YAML.

- Expertise in monitoring tools like Splunk or Dynatrace for alerting, dashboarding, and log analysis.

- Hands-on experience with CI/CD tools, specifically Jenkins.

System & Infrastructure :

- Strong understanding of Linux system administration.

- Basic exposure to cloud environments, with AWS being preferred.

Process & Data :

- Basic knowledge of ITIL/ITSM concepts (Incident, Problem, Change Management).

- Proficiency in SQL (DML and SELECT queries).

Preferred Skills :

- Experience with configuration management tools like Ansible or Chef.

- Hands-on experience with Docker and Kubernetes for container orchestration.

- Knowledge of other monitoring tools such as Prometheus or Grafana.

- Relevant certifications in Linux or cloud platforms.

- Strong problem-solving and analytical skills, with a proactive attitude toward identifying and resolving issues.

Did you find something suspicious?

Posted By

Kishore Kumar

Technical Recruiter at REVEILLE TECHNOLOGIES PRIVATE LIMITED

Last Active: 2 Dec 2025

Job Views:
46

Applications: 38

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1537162

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers