HamburgerMenu
hirist

Job Description

Description :



- Design and implement scalable telemetry pipelines for metrics, logs, traces, and events across distributed systems.


- Develop and maintain observability standards, NMS tooling, dashboards, alerting frameworks, and SLOs in collaboration with product and platform teams.


- Champion best practices in instrumentation, monitoring, and incident response across engineering teams.


- Integrate and optimise observability tools (e.g., OpenTelemetry, Prometheus, Grafana, Splunk, Elastic, etc.) within the NPS ecosystem.


- Collaborate cross-functionally to ensure observability is embedded into the SDLC and CI/CD pipelines.


- Drive adoption of observability platforms through enablement, documentation, and training.


- Continuously evaluate emerging technologies and practices to evolve our observability capabilities.

Required Skills And Experience :


- Proven experience in observability, SRE, or platform engineering roles within complex, distributed environments.


- Strong hands-on expertise with telemetry tools such as OpenTelemetry, Prometheus, Grafana, Splunk, Elastic, Loki, Jaeger, or similar.


- Proficiency in at least one programming language (e.g., Python, Go, Java) and infrastructure-as-code tools (e.g., Terraform, Helm).


- Deep understanding of cloud-native architectures (Kubernetes, microservices, service meshes).


- Experience defining and managing SLOs, SLIs, and alerting strategies.


- Strong problem-solving skills and a passion for improving system reliability and developer experience.


info-icon

Did you find something suspicious?