Posted on: 13/10/2025
Description :
The core responsibilities for the job include the following :
Platform Monitoring and Incident Handling :
- Monitor platform alerts, logs, and dashboards to proactively detect issues.
- Perform initial triage, root-cause analysis, and escalate incidents when necessary.
Operations and Maintenance :
- Execute standard operating procedures (SOPs), perform health checks, and complete routine maintenance tasks.
- Coordinate with engineering and SRE teams to resolve critical issues and maintain SLAs.
Documentation and Reporting :
- Maintain accurate documentation of issues, actions taken, and resolutions.
- Contribute to the internal knowledge base to improve future response times.
Communication and Collaboration :
- Provide timely updates to stakeholders on incident status.
- Work closely with engineering, product, and operations teams for continuous improvement.
Requirements :
- Experience : 2+ years in technical support, IT operations, or application monitoring roles.
Technical Knowledge :
- Familiarity with cloud platforms (AWS, GCP, or Azure).
- Understanding of Kubernetes basics and containerized environments is a plus.
- Good grasp of logs, monitoring tools (e. g., Grafana, Prometheus, Datadog, Splunk), and incident management workflows.
Soft Skills :
- Strong analytical, troubleshooting, and problem-solving skills.
- Excellent communication skills to collaborate across distributed teams.
Work Flexibility :
- Comfortable working on night shifts (India time) and handling on-call duties as needed.
Did you find something suspicious?