Job Title : Site Reliability Engineer (SRE)

Experience : 5+ Years (Mandatory experience in SRE)

Location : Bangalore, Hyderabad, Pune, Bhubaneswar, Noida, Gurgaon India

Work Mode : Work From Office (WFO)

Job Overview :

We are looking for a highly skilled Site Reliability Engineer (SRE) with strong experience in observability, reliability engineering, and cloud-native systems. The ideal candidate will have hands-on expertise with Splunk, Grafana, and Java/Kotlin, and will be responsible for ensuring high availability, performance, and scalability of distributed systems.

This role combines software engineering and operations, focusing on monitoring, automation, incident response, and continuous improvement of system reliability.

Key Responsibilities :

- Design, build, and maintain monitoring, alerting, and observability solutions using Splunk and Grafana.

- Create and manage dashboards, alerts, and visualizations to monitor system health, performance, and reliability.

- Analyze system metrics, logs, and traces to proactively identify performance bottlenecks and reliability issues.

- Perform incident management, root cause analysis (RCA), and implement preventive measures to avoid recurrence.

- Work closely with development teams to support bug fixing and minor feature development using Java or Kotlin.

- Support and optimize microservices-based architectures running on cloud platforms (AWS, GCP, or Azure).

- Automate operational tasks to improve efficiency, reduce toil, and enhance system reliability.

- Ensure system stability through capacity planning, performance tuning, and scalability improvements.

- Collaborate with DevOps teams to enhance CI/CD pipelines using tools like Jenkins or GitLab CI.

- Maintain and improve infrastructure reliability across development, staging, and production environments.

- Participate in on-call rotations and provide production support when required.

Required Skills & Qualifications :

- 5+ years of experience in Site Reliability Engineering or related roles (mandatory).

- Strong hands-on experience with Splunk and Grafana, including dashboard creation and alerting.

- Proficiency in Java or Kotlin for troubleshooting, bug fixes, and minor feature enhancements.

- Good understanding of microservices architecture and distributed systems.

- Experience working with cloud platforms such as AWS, GCP, or Azure.

- Strong knowledge of Linux systems, networking fundamentals, and system performance tuning.

- Hands-on experience with CI/CD tools like Jenkins or GitLab CI.

- Proficiency in Git and version control best practices.

- Strong analytical, problem-solving, and troubleshooting skills