HamburgerMenu
hirist

Observability Engineer

CROSSDEV TECHNOLOGIES PRIVATE LIMITED
4 - 7 Years
Multiple Locations

Posted on: 28/03/2026

Job Description

Description :


Job Summary :

We are seeking a skilled Observability Engineer to improve system visibility, monitoring, and reliability across distributed and cloud-native environments. The ideal candidate will be responsible for building and maintaining observability frameworks that provide deep insights into application performance, system health, and user experience. You will work closely with DevOps, SRE, and engineering teams to proactively identify issues, optimize system performance, and ensure high availability of services.


Key Responsibilities :


- Design, implement, and maintain end-to-end observability solutions across applications and infrastructure

- Build and manage monitoring systems for metrics, logs, and distributed tracing

- Develop dashboards and visualizations to provide actionable insights into system performance

- Define and configure alerting strategies to proactively detect anomalies and incidents

- Troubleshoot production issues and perform root cause analysis (RCA)

- Collaborate with DevOps, SRE, and development teams to improve system reliability and performance

- Establish best practices for logging, monitoring, and observability across teams

- Continuously evaluate and implement new tools and technologies to enhance observability capabilities

- Support incident management processes, including on-call rotations and post-incident reviews

- Optimize system performance, scalability, and reliability through data-driven insights


Required Skills :


- Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Datadog, or similar platforms

- Strong understanding of distributed systems, microservices architecture, and cloud-native applications

- Experience working with cloud platforms (AWS, Azure, or GCP)

- Solid knowledge of logging, metrics, and distributed tracing concepts (e.g., OpenTelemetry, Jaeger, Zipkin)

- Experience in setting up alerting systems and incident management workflows

- Proficiency in scripting languages such as Python or Bash for automation and troubleshooting

- Familiarity with CI/CD pipelines and DevOps practices

- Experience with containerization and orchestration tools (Docker, Kubernetes)


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in