HamburgerMenu
hirist

Job Description

Job Description :


Responsibilities :

- Ensure the smooth operation of production systems through proactive monitoring and maintenance.

- Respond promptly to system alerts and incidents, minimizing downtime and impact.

- Implement and manage observability tools such as Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana), and

- Develop and maintain dashboards and alerts to track key performance indicators (KPIs).

- Analyze monitoring data to identify trends, potential issues, and areas for optimization.

- Manage and analyze system logs to troubleshoot errors and identify root causes.

- Implement log aggregation and analysis solutions to improve error detection and resolution.

- Develop and maintain error handling procedures and documentation.

- Develop and maintain automation scripts using Python, Bash, or other scripting languages to streamline operations.

- Automate routine tasks, deployments, and infrastructure management to improve efficiency and reduce manual effort.

- Implement Infrastructure as Code (IaC) principles to manage infrastructure configurations.

- Investigate and resolve production incidents, performing root cause analysis and implementing effective solutions.

- Collaborate with development and operations teams to resolve complex issues and ensure timely resolution.

- Enhance and maintain UI automation and testing frameworks to improve testing efficiency and coverage.

- Participate in deployment processes, ensuring smooth and reliable releases.

- Implement and maintain CI/CD pipelines to automate software delivery.

- Identify and address performance bottlenecks in production systems.

- Implement performance tuning and optimization strategies to improve system efficiency and responsiveness.

- Monitor and analyze system performance metrics to identify areas for improvement.

- Create and maintain comprehensive documentation for system configurations, procedures, and troubleshooting guides.

- Share knowledge and best practices with team members through training sessions and knowledge base articles.

- Contribute to the development and improvement of internal tools and processes.

Requirements :

Essential :

- 2+ years of experience in Production Engineering, DevOps, or Testing Automation.

- Strong scripting skills in Python, Bash, or similar languages.

- Hands-on experience with observability tools like Prometheus, Grafana, ELK, or Datadog.

- Experience with log management and error handling.

- Familiarity with UI automation and testing frameworks.

- Strong problem-solving and analytical skills.

- Ability to work independently and as part of a team.

- Good communication and interpersonal skills

info-icon

Did you find something suspicious?