HamburgerMenu
hirist

Observability Engineer

INTRAEDGE TECHNOLOGIES PRIVATE LIMITED
5 - 8 Years
Others

Posted on: 07/03/2026

Job Description

Description :

Role Overview :

We are looking for an experienced Observability/Platform Engineer with a strong software engineering background to design, build, and operate scalable observability platforms. The ideal candidate will have hands-on experience working with distributed systems and modern observability tools to monitor, troubleshoot, and optimize production systems.

This role requires deep expertise in metrics, logs, traces, and telemetry-based debugging, along with experience working in cloud-native and Kubernetes environments.

Key Responsibilities :

- Design, build, and maintain observability services and platforms that provide visibility into distributed systems.

- Develop and maintain software components using Golang, Java, Python, or C#.

- Implement monitoring solutions for metrics, logs, traces, and events across large-scale systems.

- Build and operate observability pipelines using tools such as Prometheus, Grafana, OpenSearch/Elasticsearch, Jaeger, Tempo, and Datadog.

- Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure system reliability and performance.

- Troubleshoot production issues using telemetry data and observability tools.

- Optimize system performance by analyzing latency, throughput, concurrency, and memory usage.

- Collaborate with platform, infrastructure, and application teams to ensure observability is embedded across systems.

- Operate and manage services in Kubernetes and cloud environments.

- Ensure systems are scalable, resilient, and maintain high availability in production.

Required Skills & Expertise :

Programming :

- Production experience in Golang, Java, Python, or C#

- Strong software engineering fundamentals

Observability & Monitoring :

- Understanding of metrics, logs, traces, events

- Experience defining and managing SLIs and SLOs

- Experience with observability tools such as:

- Prometheus

- Grafana

- OpenSearch / Elasticsearch

- Jaeger

- Tempo

- Datadog

Systems & Infrastructure :

- Experience building or operating distributed systems

- Hands-on experience working with Kubernetes

- Experience with cloud environments

Performance & Reliability :

- Strong understanding of performance optimization

- Knowledge of concurrency and memory management

- Ability to debug and resolve production issues using telemetry

Experience & Qualifications :

- Minimum 5 years of hands-on experience building or maintaining observability services.

- Strong understanding of distributed system architecture and reliability engineering.

- Experience working in production environments with large-scale systems.

- Strong problem-solving, debugging, and troubleshooting capabilities.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in