Posted on: 27/01/2026
Description :
We are looking for a proactive and detail-oriented Production Support Engineer to ensure the stability, availability, and performance of our critical applications. In this role, you will be the first line of defense against production issues, bridging the gap between our users and the engineering team. You will not just resolve tickets; you will analyze root causes, automate manual tasks, and ensure our services run smoothly 24/7.
Key Responsibilities :
- Incident Management : Monitor alerts, acknowledge, and resolve production incidents within defined SLAs. Act as the primary point of contact for L2/L3 support.
- Troubleshooting & RCA : Perform deep-dive analysis of application logs, database queries, and system metrics to identify the Root Cause (RCA) of recurring issues.
- System Monitoring : proactively monitor application health using tools like Datadog, grafana, etc
- Database Support : Write complex SQL queries to extract data or navigate through MongoDB data, fix data inconsistencies, and generate ad-hoc reports.
- Deployment Support : Assist the DevOps and Engineering teams during release cycles and deployment activities.
- Automation : Create scripts (Python/Bash/Shell) to automate repetitive support tasks and reduce toil.
- Documentation : Maintain an up-to-date Knowledge Base (KB) and Runbooks to ensure faster resolution for future incidents.
- Stakeholder Communication : clearly communicate outage status and updates to internal stakeholders (CSM, Product Managers, Engineering Managers) and external clients.
Technical Requirements :
- Databases : Solid experience with NoSQL,SQL (Mongo, PostgreSQL and MySQL). Ability to query and analyze data.
- Operating Systems : Proficiency in Linux/Unix environments (command line, file systems, permissions,
grep/sed/awk).
- Scripting : Basic to intermediate scripting skills in Bash, Python, or PowerShell.
- Ticketing Tools : Experience with ITSM tools like Jira.
- Monitoring Tools : Familiarity with logging and monitoring platforms (ELK Stack, Grafana, Prometheus,
Datadog).
- Web Services : Understanding of REST APIs and how to test them (using Postman or curl).
Nice-to-Have (Bonus Skills) :
- Experience with Cloud platforms (AWS preferably).
- Knowledge of containerization (Docker, Kubernetes).
- Exposure to Scheduler tools (e.g., Autosys, Control-M, Airflow).
- Understanding of ITIL processes (Incident, Problem, Change Management).
Soft Skills :
- High Pressure Handling : Ability to stay calm and focused during critical P1/P0 outages.
- Curiosity : A natural desire to dig deeper to understand why something broke, not just fix it.
Did you find something suspicious?