Posted on: 07/08/2025
Responsibilities :
- Solution Packaging : Lead the end-to-end development of observability packages for 100+ standard technologies across infrastructure, databases, middleware, and application platforms
- Data Collection Strategy : Define and implement data collection strategies including agent instrumentation, API integrations, log and metrics collection pipelines, and auto-discovery mechanisms.
- Golden Signals and Data Modeling : Define golden signals, KPIs, SLIs/SLOs, and data schemas for different component types to support health monitoring, performance optimization, and anomaly detection.
- Dashboards, Alerts, Reports : Design and standardize visualizations, alerting rules, reporting templates, and RCA workflows for fast detection and resolution of issues.
- Platform Enablement : Guide enhancements to agents, collectors, and platform components to support new integrations and data formats.
- Team Leadership : Lead a team of engineers and specialists focused on observability solutions development. Establish best practices, design standards, and agile delivery pipelines.
- Collaboration and Stakeholder Management : Work closely with product management, DevOps, SRE, and customer success teams to align on priorities, gather requirements, and validate delivered packages.
- Quality, Scale, and Reusability : Ensure all developed solutions are scalable, reusable, and version-controlled with automated testing and documentation
Requirements :
- Minimum 6+ years of experience in observability, monitoring, SRE, or platform engineering roles.
- Strong hands-on experience with observability tools such as Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, Splunk, or similar.
- In-depth understanding of logs, metrics, traces, profiling, events, and the corresponding instrumentation/collection mechanisms.
- Proven experience in developing observability solutions for platforms like Kubernetes, databases (Oracle, PostgreSQL), middleware (Tomcat, WebLogic), and distributed systems.
- Experience with scripting, APIs, and automation frameworks (Python, Shell, Terraform, etc. ).
- Familiarity with RCA techniques, anomaly detection, and alert fatigue reduction strategies.
- Ability to define and enforce design patterns, standards, and governance models.
- Strong leadership, project management, and cross-functional collaboration skills.
- Excellent verbal and written communication skills.
Good to Have Skills :
- Experience building or managing a packaged observability marketplace or platform.
- Contributions to open-source observability projects.
- Certifications in Kubernetes, Observability tools, or cloud platforms (AWS, Azure, GCP).
- Background in ITSM, CMDBs, or workflow automation is a plus.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1526569
Interview Questions for you
View All