Posted on: 08/12/2025
Description :
Key Responsibilities :
- Perform advanced diagnostics using system logs, configurations, monitoring tools, and scripts.
- Apply permanent fixes or temporary workarounds to restore services within defined SLAs.
- Escalate critical, high-impact, or widespread issues to the Incident Manager.
- Collaborate with application, network, cloud, and infrastructure teams during major incidents.
- Document incident timelines, troubleshooting procedures, and final resolutions.
- Enhance internal knowledge base, SOPs, and playbooks to improve operational efficiency.
- Ensure continuous monitoring and proactive identification of service degradation.
Skills & Qualifications :
Technical Skills :
- Strong understanding of Windows/Linux operating systems and core networking fundamentals.
- Working knowledge of AWS, Azure, and cloud computing concepts.
- Hands-on experience with monitoring tools (e.g., Nagios, SolarWinds, Datadog, Zabbix) and
ITSM platforms (ServiceNow, Jira, etc.).
- Proficiency in analyzing system/application logs and troubleshooting service-level incidents.
- Basic to intermediate scripting skills in PowerShell, Bash, or Python (preferred).
Professional Skills :
- Excellent analytical and problem-solving skills.
- Ability to work under pressure in a fast-paced 24/7 operational environment.
Experience :
- Bachelors degree in IT, Computer Science, or related field.
- 4 - 6 years of experience in NOC, IT Operations, or Infrastructure Monitoring roles.
Did you find something suspicious?