HamburgerMenu
hirist

Job Description

Description :



We are seeking a seasoned Observability Architect to define and lead our end-to-end observability strategy across highly distributed, cloud-native, and hybrid environments.


This role requires a visionary leader with deep hands-on experience in New Relic and a strong working knowledge of other modern observability platforms like Datadog, Prometheus/Grafana, Splunk, OpenTelemetry, and more. You will design scalable, resilient, and intelligent observability solutions that empower engineering, SRE, and DevOps teams to proactively detect issues, optimize performance, and ensure system reliability. This is a senior leadership role with significant influence over platform architecture, monitoring practices, and cultural transformation across global teams.



Key Responsibilities :


- Architect and implement full-stack observability platforms, covering metrics, logs, traces, synthetics, real user monitoring (RUM), and business-level telemetry using New Relic and other tools like Datadog, Prometheus, ELK, or AppDynamics.



- Design and enforce observability standards and instrumentation guidelines for microservices, APIs, front-end applications, and legacy systems across hybrid cloud environments.



- Experience in OpenTelemetry adoption, ensuring vendor-neutral, portable observability implementations where appropriate.



- Build multi-tool dashboards, health scorecards, SLOs/SLIs, and integrated alerting systems tailored for engineering, operations, and executive consumption.



- Collaborate with engineering and DevOps teams to integrate observability into CI/CD pipelines, GitOps, and progressive delivery workflows.



- Partner with platform, cloud, and security teams to provide end-to-end visibility across AWS, Azure, GCP, and on-prem systems.



- Lead root cause analysis, system-wide incident reviews, and reliability engineering initiatives to reduce MTTR and improve

MTBF.



- Evaluate, pilot, and implement new observability tools/technologies aligned with enterprise architecture and scalability requirements.



- Deliver technical mentorship and enablement, evangelizing observability best practices and nurturing a culture of ownership and data-driven decision-making.



- Drive observability governance and maturity models, ensuring compliance, consistency, and alignment with business SLAs and customer experience goals.



Required Qualifications :



- 15+ years of overall IT experience, hands-on with application development, system architecture, operations in complex distributed environments, troubleshooting and integration for applications and other cloud technology with observability tools.



- 5+ years of hands-on experience with observability tools such as New relic, Datadog, Prometeus, etc. including APM, infrastructure monitoring, logs, synthetics, alerting, and dashboard creation.



- Proven experience and willingness to work with multiple observability stacks, such as :

  • Datadog, Dynatrace, AppDynamics
  • Prometheus, Grafana, etc.
  • Elasticsearch, Fluentd, Kibana (EFK/ELK)
  • Splunk, OpenTelemetry,


- Solid knowledge of Kubernetes, service mesh (e.g., Istio), containerization (Docker) and orchestration strategies.



- Strong experience with DevOps and SRE disciplines, including CI/CD, IaC (Terraform, Ansible), and incident response workflows.



- Fluency in one or more programming/scripting languages: Java, Python, Go, Node.js, Bash.



- Hands-on expertise in cloud-native observability services (e.g., CloudWatch, Azure Monitor, GCP Operations Suite).



- Excellent communication and stakeholder management skills, with the ability to align technical strategies with business goals.



Preferred Qualifications :



- Architect level Certifications in New Relic, Datadog, Kubernetes, AWS/Azure/GCP, or SRE/DevOps practices.



- Experience with enterprise observability rollouts, including organizational change management.



- Understanding of ITIL, TOGAF, or COBIT frameworks as they relate to monitoring and service management.



- Familiarity with AI/ML-driven observability, anomaly detection, and predictive alerting.



Why Join Us :



- Lead enterprise-scale observability transformations impacting customer experience, reliability, and operational excellence.



- Work in a tool-diverse environment, solving complex monitoring challenges across multiple platforms.



- Collaborate with high-performing teams across development, SRE, platform engineering, and security.



- Influence strategy, tooling, and architecture decisions at the intersection of engineering, operations, and business


info-icon

Did you find something suspicious?