Posted on: 04/08/2025
Site Reliability Engineer Team Lead | Chandigarh (Onsite) | Permanent
POSITION :
We are looking for an experienced "Site Reliability Engineer Team Lead" to lead an SRE team.
The ideal candidate will have a strong background in enhancing the reliability and scalability of services, leading technical teams, and driving strategic initiatives to improve a Lodging-as-a-Service platform.
RESPONSIBILITIES :
- Leadership & Mentorship : Lead, mentor, and develop a team of SREs, fostering a culture of reliability, collaboration, and continuous improvement.
- Strategic Planning : Drive the design and implementation of scalable, sustainable solutions, and lead the transition towards a cloud-native, serverless, and NoOps environment.
- Service Excellence : Oversee service availability, system performance, and capacity planning for critical.
- Cross-Functional Collaboration : Work closely with stakeholders across the organization to solve complex.
technical challenges and enhance user experiences.
- Incident Management : Lead incident response efforts, perform root cause analysis, and implement preventative measures.
- Process Optimization : Champion the adoption of best practices in monitoring, automation, and observability.
- SLO Management : Define and manage Service Level Objectives (SLOs) to guide prioritization and ensure reliability.
REQUIRED EXPERIENCE :
- Experience : 7+ years in site reliability engineering or related fields, with at least 2 years in a leadership role.
- Education : Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Technical Expertise :
- Extensive experience with AWS cloud services and cloud engineering best practices.
- Proficiency in programming languages such as Java, Python, and familiarity with React.
- Deep understanding of software engineering methodologies and development cycles.
- Expertise in monitoring and observability tools (New Relic, Kibana, Prometheus, Grafana, ElasticSearch).
Leadership Skills : Proven ability to lead technical teams, manage projects, and communicate effectively with stakeholders.
Problem-Solving skills : Exceptional analytical abilities to perform root cause analysis and develop effective solutions.
Automation & Efficiency : Strong background in automating processes and driving operational efficiency.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1523915
Interview Questions for you
View All