Posted on: 29/01/2026
Description :
Customer currently uses ELK stack, and the goal is to standardize and modernize logs, metrics, and traces using OpenTelemetry, while improving visibility, reliability, and operational intelligence.
Observability Architecture & Modernization :
- Assess the existing ELK-based observability setup and define a modern observability architecture
- Design and implement standardized logging, metrics, and distributed tracing using OpenTelemetry
- Define observability best practices for cloud-native and Azure-based applications
- Ensure consistent telemetry collection across microservices, APIs, and infrastructure
Logging, Metrics & Tracing :
- Instrument applications using OpenTelemetry SDKs (SpringBoot, .NET, Python, Javascript as applicable)
- Support Kubernetes and container-based workloads (if applicable)
- Configure and optimize log pipelines, trace exporters, and metric collectors
- Integrate OpenTelemetry with ELK / OpenSearch / Azure Monitor / other backends
- Define SLIs, SLOs, and alerting strategies
- Knowldege in integrating the GitHub and Jira metrics as DORA metrics to observability.
Operational Excellence :
- Improve observability performance, cost efficiency, and data retention strategies
- Create dashboards, runbooks, and documentation
AI-based Anomaly Detection & Triage (Good to Have) :
- Design or integrate AI/ML-based anomaly detection for logs, metrics, and traces
- Worked on AIOps capabilities for automated incident triage and insights
Required Technical Skills :
Core Observability :
- Strong hands-on experience with ELK Stack (Elasticsearch, Logstash, Kibana)
- Deep understanding of logs, metrics, traces, and distributed systems
- Practical experience with OpenTelemetry (Collectors, SDKs, exporters, receivers)
Cloud & Platforms :
- Strong experience with Microsoft Azure to integrate with Observability platform.
- Experience with Kubernetes / AKS to integrate with Observability platform.
- Knowledge of Azure monitoring tools (Azure Monitor, Log Analytics, Application Insights)
- Experience with Kubernetes / AKS is a strong plus.
Soft Skills :
- Strong architecture and problem-solving skills
- Clear communication and documentation skills
- Hands-on mindset with an architect-level view
Good to Have / Preferred Skills :
- Experience with AIOps / anomaly detection platforms
- Exposure to tools like Prometheus, Grafana, Jaeger, OpenSearch, Datadog, Dynatrace, New Relic (any)
- Experience with incident management, SRE practices, and reliability engineering
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1607030