Posted on: 13/02/2026
Description :
- Design, deploy, and manage highly available and scalable infrastructure on AWS Cloud.
- Implement and maintain CI/CD pipelines using GitHub Actions.
- Manage and optimize Kubernetes clusters (EKS) for containerized workloads.
- Implement monitoring, logging, and observability solutions using Prometheus, Grafana, Loki, Promtail, Coralogix
- Ensure high availability, reliability, and performance of production systems.
- Plan, implement, and execute Disaster Recovery (DR) strategies, including DR drills and failover testing.
- Automate infrastructure provisioning, deployment, and configuration management.
- Troubleshoot production issues, perform root cause analysis, and provide permanent fixes.
- Collaborate with development, QA, and security teams to streamline DevOps workflows.
- Maintain documentation for infrastructure, deployment, and DR processes.
- Ensure best practices in security, compliance, and cost optimization.
Required Skills & Qualifications :
Core Technical Skills :
- AWS Cloud (Expert level) EC2, S3, IAM, VPC, RDS, ELB, Auto Scaling, CloudWatch, Route 53, Lambda.
- Kubernetes (Expert level) Cluster setup, management, scaling, upgrades, and troubleshooting.
- CI/CD : GitHub Actions
- Monitoring & Logging
- Prometheus
- Grafana
- Loki
- Promtail
- Coralogix
- Disaster Recovery (DR) : DR strategy, backup, failover, testing, and documentation.
Additional Skills (Good to Have) :
- Infrastructure as Code (Terraform / CloudFormation)
- Docker & containerization
- Linux system administration & scripting (Bash / Python)
- Security best practices, IAM policies, secrets management
The job is for:
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1612624