HamburgerMenu
hirist

Job Description

Job Description :


Responsibilities :


- Ensure the reliability, performance, and availability of Integral's applications through proactive monitoring and automation.

- Develop and maintain real-time monitoring, alerting, and logging systems to detect and resolve issues before they impact customers.

- Automate manual operations, including application deployment, configuration, scaling, and recovery.

- Collaborate with software engineering teams to integrate reliability best practices into the development lifecycle.

- Conduct root cause analysis (RCA) and implement preventive measures to mitigate recurring issues.

- Support a 24/7 distributed enterprise environment across multiple global data centers.

- Work closely with the Support team to enhance incident response processes, ensuring fast and effective resolution of technical escalations.

- Participate in on-call rotations to support critical application issues and outages.

- Maintain and optimize CI/CD pipelines to ensure fast and reliable application releases.

- Enhance system security by managing SSL certificates, encryption, and authentication mechanisms.

- Foster a culture of continuous improvement by evaluating new tools, frameworks, and methodologies to enhance system reliability.


Requirements :


- Bachelors degree in Computer Science, Engineering, or a related field, or equivalent experience.

- 4+ years of experience in a similar role focusing on application reliability, automation, and performance optimization.

- Strong expertise in Linux and Windows system administration.

- Proficiency in at least one scripting language (e.g., Python, Shell, Perl, JavaScript).

- Experience with Docker, Kubernetes, or containerization technologies.

- Familiarity with CI/CD tools like Jenkins and deployment automation frameworks.

- Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack, New Relic, Datadog).

- Understanding of networking concepts (TCP, IP, DNS, load balancing, firewalls).

- Experience with configuration management tools like Ansible, Salt, or Puppet.

- Strong debugging and troubleshooting skills across application, database, and infrastructure layers.

- Ability to work in a fast-paced, high-pressure environment with multiple priorities.

- Excellent communication and collaboration skills to work effectively with engineering and support teams.


info-icon

Did you find something suspicious?