Posted on: 04/02/2026
Job Description : Cloud & DevOps Engineer
Location : Madhapur, Hyderabad
Employment Type : Contract role for 1 year
Department : Engineering - Infrastructure
Role Overview :
We are seeking a Cloud & DevOps Engineer to design, deploy, and maintain the infrastructure that powers HoloMe's next-generation AI avatar and kiosk systems. You will ensure our AWS- and Kubernetes-based environment is secure, scalable, and highly available, while working cross-functionally with backend engineers, AI/ML specialists, QA, and product teams to deliver seamless, real-time interactive experiences.
Key Responsibilities :
Infrastructure & Deployment :
- Manage and optimize AWS infrastructure (EC2, RDS PostgreSQL + pgvector, DynamoDB, Redis, RabbitMQ).
- Deploy and maintain Kubernetes clusters for scaling, health checks, and rolling updates.
- Build and maintain Docker images for Django, Daphne, Redis, RabbitMQ, Celery, and supporting services.
- Oversee ingress routing via Cloudflare - NGINX - Application EC2.
Automation & CI/CD :
- Implement and optimize CI/CD pipelines for rapid and reliable deployments.
- Automate environment provisioning using Terraform or equivalent IaC tools.
- Maintain consistent build and deployment practices across staging, testing, and production.
Monitoring & Observability :
- Configure and maintain Grafana dashboards (grafana.holome.ai) with Prometheus and Redis/Postgres exporters.
- Define and refine alerts for CPU/memory, Redis usage, RabbitMQ queue health, Celery task backlog, and 5xx error rates.
- Provide observability tools and insights to engineering and QA teams.
Security & Compliance :
- Maintain Cloudflare WAF, TLS/SSL termination, and DDoS protections.
- Manage EC2 Security Groups, IAM roles, and least-privilege policies.
- Support middleware security (SQL injection detection, IP blocking, request throttling).
- Ensure compliance-readiness for GDPR/CCPA/PDPL, including logging, encryption, and data isolation
Collaboration & Support :
- Work with AI/ML engineers to optimize GPU clusters or cloud inference nodes.
- Partner with QA engineers to integrate automated load/resilience testing.
- Support operations teams with documentation, playbooks, and incident response.
Qualifications :
Must-Have Skills :
- Proven experience with AWS (EC2, RDS, DynamoDB, IAM, CloudWatch, KMS).
- Strong background with Docker & Kubernetes in production.
- CI/CD pipeline design (GitHub Actions, GitLab CI, or Jenkins).
- Infrastructure-as-Code (Terraform, CloudFormation, or Ansible).
- Monitoring and observability (Grafana, Prometheus, ELK stack).
- Linux system administration and networking expertise.
Nice-to-Have Skills :
- Experience with Redis, RabbitMQ, and Celery in production.
- Familiarity with Django/ASGI-based apps (Daphne, NGINX).
- Understanding of compliance frameworks (GDPR, PDPL, CCPA).
- Security testing, WAF tuning, and penetration testing exposure.
- Experience with AI/ML infrastructure (GPU clusters, inference optimization).
Soft Skills :
- Strong problem-solving and debugging mindset.
- Clear communicator, able to document and share knowledge effectively.
- Comfortable in cross-functional teams spanning product, AI, backend, and QA.
- Ownership mentality with a focus on uptime, reliability, and resilience.
What We Offer :
- Opportunity to work on cutting-edge holographic avatar technology.
- Collaborative and innovative environment with cross-functional exposure.
- Flexible work arrangements in Kuala Lumpur HQ with hybrid options.
- Growth opportunities in AI, cloud, and real-time rendering domains.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1609754