HamburgerMenu
hirist

Lead Platform Engineer - Observability Services

Posted on: 18/08/2025

Job Description

Roles & Responsibilities :

- Solution Packaging: Lead the end-to-end development of observability packages for 100+ standard technologies across infrastructure, databases, middleware, and application platforms.

- Data Collection Strategy: Define and implement data collection strategies including agent instrumentation, API integrations, log and metrics collection pipelines, and auto-discovery mechanisms.

- Golden Signals & Data Modeling: Define golden signals, KPIs, SLIs/SLOs, and data schemas for different component types to support health monitoring, performance optimization, and anomaly detection.

- Dashboards, Alerts, Reports: Design and standardize visualizations, alerting rules, reporting templates, and RCA workflows for fast detection and resolution of issues.

- Platform Enablement: Guide enhancements to agents, collectors, and platform components to support new integrations and data formats.

- Team Leadership: Lead a team of engineers and specialists focused on observability solutions development. Establish best practices, design standards, and agile delivery pipelines.

- Collaboration & Stakeholder Management: Work closely with product management, DevOps, SRE, and customer success teams to align on priorities, gather requirements, and validate delivered packages.

- Quality, Scale & Reusability: Ensure all developed solutions are scalable, reusable, and version-controlled,with automated testing and documentation.


What You Bring :


Mandatory Skills :


- Minimum 6+ years of experience in observability, monitoring, SRE, or platform engineering roles.

- Strong hands-on experience with observability tools such as Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, Splunk, or similar.

- In-depth understanding of logs, metrics, traces, profiling, events, and the corresponding instrumentation/collection mechanisms.

- Proven experience in developing observability solutions for platforms like Kubernetes, databases (Oracle, PostgreSQL), middleware (Tomcat, WebLogic), and distributed systems.

- Experience with scripting, APIs, and automation frameworks (Python, Shell, Terraform, etc.).

- Familiarity with RCA techniques, anomaly detection, and alert fatigue reduction strategies.

- Ability to define and enforce design patterns, standards, and governance models.

- Strong leadership, project management, and cross-functional collaboration skills.

- Excellent verbal and written communication skills.


Good to Have Skills :

- Experience building or managing a packaged observability marketplace or platform.

- Contributions to open-source observability projects.

- Certifications in Kubernetes, Observability tools, or cloud platforms (AWS, Azure, GCP).

- Background in ITSM, CMDBs, or workflow automation is a plus.


info-icon

Did you find something suspicious?