HamburgerMenu
hirist

Site Reliability Engineer - Cloud Infrastructure

Posted on: 24/08/2025

Job Description

Roles & Responsibilities :


- Maintain and monitor the availability of cloud infrastructure, troubleshoot, identify, and resolve production-level infrastructure issues.


- Using Infrastructure as a Code (IAAC) tools, develop and maintain automation tools for provisioning, configuration management, and deployment.


- Establish and maintain monitoring and alerting systems for the detection and response to incidents.


- Demonstrate strong customer focus.


- Should have the ability to collaborate with internal teams and customers during incidents, explaining the issue, recommending immediate mitigations, and providing long-term solutions.


- Investigate customer escalations and work closely with the engineering, support, and sales teams to implement a solution.


- Perform a postmortem analysis of system failures and implement corrective measures as necessary.


- Participate in the rotational on-call schedule based on the need to be available in an emergency.


- A demonstrated track record of optimising cloud infrastructure costs. Monitor and control the use of cloud resources, implement cost-saving measures, and provide recommendations for optimising cloud costs.


- Experience implementing security best practices and compliance measures in production environments.


- Experience with security audits, vulnerability assessments, and the implementation of security controls to protect sensitive data and ensure regulatory compliance.



Candidate Profile :



- 3+ years experience with a focus on cloud infrastructure automation, configuration management, and deployment automation. Significant portion of AWS is used for mid to large size deployments.


- Experience designing, architecting, and running large scale cloud infrastructure.


- Experience working with reverse proxy, webserver, load balancing and CDN services.


- Familiarity with security best practices and compliance frameworks such as PCI DSS


- Strong interpersonal and communication skills (including oral, written, and listening skills)


- Experience with stress testing and tuning production systems using tools such as K6, Locust


- Experience in using AWS Cost Explorer, AWS Budgets, and AWS Cost and Usage Reports and optimising costs to ensure efficient resource use.



Technical skills :



- Experience with AWS in designing, deploying, and managing cloud infrastructure.


- Experience with scripting languages such as Python and Bash


- Experience managing reverse proxies/web servers on a large-scale production level.


- Experience with infrastructure as a code tool such as Terraform/CloudFormation


- Experience working with Kafka, Elasticsearch, and RabbitMQ


- Experience with observation tools such as Prometheus, Grafana, and Loki


info-icon

Did you find something suspicious?