Posted on: 23/07/2025
Job Title : Observability & SRE Engineer Azure & Splunk
Location : Delhi NCR, Pune, Mumbai and Bangalore(Hybrid)
Experience : 6 to 10 Years
Employment Type : Contract (Fixed Term: 3months)
Notice Period : Immediate / Up to 30 Days
Role Overview :
- Develop and maintain monitoring, alerting, and dashboarding solutions to ensure system health and performance.
- Implement Azure Chaos Engineering tools and scenarios to proactively test the resilience of cloud applications.
- Collaborate with application and infrastructure teams to identify SLOs/SLIs and define reliability objectives.
- Automate incident detection and response processes using Splunk alerts, Azure Automation, and scripting.
- Conduct root cause analysis (RCA) and post-incident reviews to drive continuous improvement.
- Drive the adoption of SRE principles and practices across engineering teams.
Must-Have Skills :
- Proficiency with Azure services, especially Azure Monitor, Log Analytics, and Application Insights.
- Practical experience with Azure Chaos Studio or equivalent chaos engineering tools.
- Deep understanding of SRE practices, including SLIs/SLOs, error budgets, incident management, and
reliability metrics.
- Experience with scripting languages (PowerShell, Python, Bash) for automation and tooling.
- Strong troubleshooting and analytical skills in complex distributed systems.
Good to Have :
- Knowledge of Azure DevOps, CI/CD pipelines, and infrastructure-as-code (Terraform, Bicep).
- Experience in Kubernetes observability (AKS).
- Familiarity with ITIL or incident/problem/change management workflows.
Education & Certifications :
- Azure certifications (e.g., AZ-400, AZ-305) preferred.
- Splunk certifications (e.g., Splunk Core Certified Power User or Admin) are a plus.
Work Conditions :
- May include on-call responsibilities for critical systems support.
Did you find something suspicious?
Posted By
Sangeetha Suresh Kumar
Project Coordinator at EILGLOBAL IT SOLUTIONS AND SERVICES PVT LTD
Last Active: 21 Nov 2025
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1518039
Interview Questions for you
View All