HamburgerMenu
hirist

Job Description

Description :

Job Title : Senior Site Reliability Engineer (SRE) Datadog Observability.

Experience Required : 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in Datadog.

Location : Hyderabad preferable but open for Pune and remote.

Job Summary :


We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE implementation initiatives with a strong focus on Datadog Observability.


The ideal candidate will bring deep technical expertise in building reliable, scalable, and observable systems, with hands-on experience in integrating enterprise applications and middleware.

Key Responsibilities :


- Drive end-to-end SRE implementation, ensuring system reliability, scalability, and performance.


- Design, configure, and manage Datadog dashboards, monitors, alerts, and APM for proactive issue detection and resolution.

- Utilize the Datadog Roles API to create and manage user roles, global permissions, and access controls for various teams.


- Collaborate with product managers, engineering teams, and business stakeholders to identify

observability gaps and design solutions using Datadog.

- Implement automation for alerting, incident response, and ticket creation to improve operational efficiency.

- Work closely with business and IT teams to support critical Financial Month-End, Quarter-End, and Year-End closures.

- Leverage Datadog AI.

- Provide technical leadership in observability, reliability, and performance engineering practices.

Required Skills And Experience :


- 8+ years of experience in Site Reliability Engineering, Observability.

- Minimum 3+ years of hands-on experience with Datadog (dashboards, APM, alerting, log

management, Roles API, and monitoring setup).

- Proven experience implementing SRE best practices - incident management, postmortems,

automation, and reliability metrics.

- Excellent stakeholder management and communication skills; experience collaborating with

business and IT teams.

- Strong problem-solving mindset and ability to work in high-pressure production support

environments.

Preferred Qualifications :


- Certification in Datadog or related observability platforms.


- Knowledge of CI/CD tools and automation frameworks.

- Experience in cloud platforms (AWS, Azure, or OCI).

- Exposure to ITIL-based production support processes.


info-icon

Did you find something suspicious?