AI/ML

Artificial Intelligence

Machine Learning

Security Architect - AI

Cloud Architect - ML/AI

Emerging Technologies

DevOps / SRE

CyberSecurity

Quality Assurance

Platform Engineering / SAP/Oracle

Bridgenext - Lead NOC/Site Reliability Engineer

BRIDGENEXT INDIA PRIVATE LIMITED

8 - 10 Years

Pune

Network Operations Center Site Reliability DevOps Cloud ELK Stack Network Infrastructure Prometheus Terraform Python Cloud Infrastructure

Posted on: 13/11/2025

Job Description

Responsibilities :

- 7+ years of experience in SRE, DevOps, or infrastructure management.

- Lead the NOC/SRE team from the front, ensuring a culture of proactive monitoring, rapid response, and continuous improvement.

- Act as the primary escalation point for major incidents, providing technical guidance and decision-making.

- Collaborate with DevOps, Engineering, and Product teams to enhance system reliability.

- Define best practices, incident response protocols, and runbooks for the team.

- Lead log tracing and deep troubleshooting for infrastructure, network, and application issues.

- Reduce MTTR (Mean Time to Resolution) and improve incident management processes.

- Expertise in troubleshooting complex infrastructure and application issues.

- Strong knowledge of log tracing, distributed tracing, and observability tools (e.g., ELK, Splunk, Grafana, Prometheus, OpenTelemetry).

- Deep understanding of SLAs, SLOs, and error budgets.

- Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker).

- Good knowledge of Terraform, Kubernetes, Docker, and cloud architectures.

- Proficiency in monitoring and observability tools (New Relic, Prometheus, Datadog, etc.).

- Understanding of CI/CD pipelines, automation, and infrastructure as code (IaC).

- Basic scripting skills in Python, Go, Shell, or similar.

- Strong troubleshooting skills for complex distributed systems.

- Ability to mentor junior engineers and drive SRE best practices.

- Willingness to primarily work during 3:30 PM to 3:30 AM IST, with flexibility to adjust shifts as needed based on operational requirements.

- Strong problem-solving skills and ability to work in a fast-paced environment.

- Strong incident management, troubleshooting, and RCA skills.

Qualifications :

- 6+ years of experience in Site Reliability Engineering (SRE) / NOC / DevOps roles.

- Proven leadership experience, managing or mentoring a team.

- Hands-on experience with Terraform for Infrastructure as Code (IaC).

- Experience in Python for automation and scripting.

- Expertise in troubleshooting complex infrastructure and application issues.

- Strong knowledge of log tracing, distributed tracing, and observability tools (e.g., ELK, Splunk, Grafana, Prometheus, OpenTelemetry).

- Deep understanding of SLAs, SLOs, and error budgets.

- Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker).

- Familiarity with CI/CD pipelines and GitOps practices.

- Strong problem-solving skills and the ability to make quick, data-driven decisions under pressure.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Ketki Mannur

Lead recruiter at BRIDGENEXT INDIA PRIVATE LIMITED

Last Active: 13 Nov 2025

Job Views:
209

Applications: 38

Recruiter Actions: 0

Posted in

DevOps / SRE

Functional Area

Site Reliability Engineering

Job Code

1573579

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers