Posted on: 15/04/2026
Description :
Role Overview :
We are looking for an experienced Application Support Lead (L3) to drive reliability, resilience, and performance of critical applications. This role will act as the highest technical escalation point, lead major incidents, perform deep diagnostics (primarily on Java-based systems), and ensure long-term stability through automation and engineering best practices.
Key Responsibilities :
Incident & Problem Management :
- Lead Major Incident (MI) bridges and ensure minimal business impact
- Handle L3 escalations with deep diagnostics across Java, JVM, middleware, OS, and infrastructure
- Own Root Cause Analysis (RCA) and drive permanent fixes
- Identify recurring issues and implement preventive measures
Reliability Engineering :
- Apply SRE principles (SLIs, SLOs, error budgets)
- Perform JVM tuning, thread/heap dump analysis, and performance optimization
- Drive improvements in application architecture for scalability and fault tolerance
- Validate Disaster Recovery (DR), failover readiness, and resilience testing
Change, Release & Risk Management :
- Review and approve high-risk production changes
- Ensure operational readiness for new applications and releases
- Maintain compliance with audit and regulatory standards
Automation & Observability :
- Develop automation using Shell/Python/PowerShell
- Build frameworks for health checks and auto-remediation
- Enhance monitoring, alerting, and observability to reduce MTTR
Leadership & Stakeholder Management :
- Mentor L1/L2 teams; review SOPs, runbooks, and KBs
- Act as a technical advisor to engineering and business stakeholders
Required Skills :
Technical Expertise :
- Strong understanding of distributed systems and application architecture
- Advanced Java expertise (JVM internals, GC tuning, memory management)
- Strong Unix/Linux and networking fundamentals
- Scripting : Shell / Python / PowerShell
- Database knowledge with strong SQL skills
- Experience with schedulers like Autosys (or equivalent)
- Observability tools : Splunk, AppDynamics/Dynatrace, ELK, Grafana, Prometheus
Core Competencies :
- Major incident management & deep RCA expertise
- Experience in high-availability / regulated environments
- Strong leadership, decision-making, and communication skills
Did you find something suspicious?