HamburgerMenu
hirist

Job Description

Job Title : Observability & SRE Engineer Azure & Splunk

Location : Delhi NCR, Pune, Mumbai and Bangalore(Hybrid)

Experience : 6 to 10 Years

Employment Type : Contract (Fixed Term: 3months)

Notice Period : Immediate / Up to 30 Days

Role Overview :


We are looking for a highly skilled Observability and Site Reliability Engineer (SRE) with strong experience in Splunk integration with Azure, cloud-native monitoring, and chaos engineering practices. The ideal candidate will play a key role in improving system reliability, monitoring capabilities, and resilience across our Azure cloud infrastructure.

Key Responsibilities :


- Design, implement, and manage observability solutions using Splunk integrated with Azure Monitor, Log Analytics, and Application Insights.


- Develop and maintain monitoring, alerting, and dashboarding solutions to ensure system health and performance.

- Implement Azure Chaos Engineering tools and scenarios to proactively test the resilience of cloud applications.

- Collaborate with application and infrastructure teams to identify SLOs/SLIs and define reliability objectives.


- Automate incident detection and response processes using Splunk alerts, Azure Automation, and scripting.

- Conduct root cause analysis (RCA) and post-incident reviews to drive continuous improvement.

- Drive the adoption of SRE principles and practices across engineering teams.

Must-Have Skills :


- Strong hands-on experience with Splunk including log ingestion, parsing, alerting, and dashboard creation.

- Proficiency with Azure services, especially Azure Monitor, Log Analytics, and Application Insights.

- Practical experience with Azure Chaos Studio or equivalent chaos engineering tools.

- Deep understanding of SRE practices, including SLIs/SLOs, error budgets, incident management, and

reliability metrics.

- Experience with scripting languages (PowerShell, Python, Bash) for automation and tooling.

- Strong troubleshooting and analytical skills in complex distributed systems.

Good to Have :


- Experience with additional observability tools like Grafana, Prometheus, Datadog, or New Relic.

- Knowledge of Azure DevOps, CI/CD pipelines, and infrastructure-as-code (Terraform, Bicep).

- Experience in Kubernetes observability (AKS).

- Familiarity with ITIL or incident/problem/change management workflows.

Education & Certifications :


- Bachelors degree in Computer Science, Engineering, or related field.

- Azure certifications (e.g., AZ-400, AZ-305) preferred.

- Splunk certifications (e.g., Splunk Core Certified Power User or Admin) are a plus.

Work Conditions :


- Hybrid/Remote work flexibility based on project needs.

- May include on-call responsibilities for critical systems support.


info-icon

Did you find something suspicious?