We are seeking an experienced Observability Engineer to design, build, and operate the observability foundation for complex, distributed systems. This role focuses on enabling engineering teams to understand, troubleshoot, and optimize systems using high-quality metrics, logs, traces, and insights.
As an Observability Engineer, you will build the nervous system of the platformdeveloping scalable telemetry pipelines, defining standards, and empowering teams with actionable visibility. You will work across application, platform, SRE, and infrastructure teams to ensure systems are reliable, performant, cost-efficient, and debuggable at scale.
Key Roles & Responsibilities :
Observability Strategy & Architecture :
- Define and drive the organizations observability strategy, standards, and roadmap.
- Design comprehensive telemetry architectures for distributed and microservices-based systems.
- Establish best practices and guidelines for metrics, logging, and tracing.
- Evaluate, select, and standardize observability tools and platforms.
- Create reference architectures for instrumentation across multiple technology stacks.
- Partner with engineering teams to define SLIs, SLOs, and error budgets.
Instrumentation & Telemetry Engineering :
- Instrument applications and services with metrics, logs, and distributed traces.
- Implement end-to-end distributed tracing across microservices architectures.
- Deploy and configure telemetry agents, sidecars, and collectors.
- Implement OpenTelemetry standards, SDKs, and Collector pipelines.
- Build custom instrumentation libraries and SDKs across multiple languages.
- Create auto-instrumentation frameworks to reduce developer effort.
- Ensure semantic consistency and data quality across all telemetry signals.
Observability Platforms & Tooling :
- Deploy, manage, and optimize metrics platforms such as : Prometheus, Grafana, Datadog, New Relic, Dynatrace, AppDynamics