HamburgerMenu
hirist

Job Description

Job Description : Cloud & DevOps Engineer

Location : Madhapur, Hyderabad

Employment Type : Contract role for 1 year

Department : Engineering - Infrastructure

Role Overview :

We are seeking a Cloud & DevOps Engineer to design, deploy, and maintain the infrastructure that powers HoloMe's next-generation AI avatar and kiosk systems. You will ensure our AWS- and Kubernetes-based environment is secure, scalable, and highly available, while working cross-functionally with backend engineers, AI/ML specialists, QA, and product teams to deliver seamless, real-time interactive experiences.

Key Responsibilities :

Infrastructure & Deployment :

- Manage and optimize AWS infrastructure (EC2, RDS PostgreSQL + pgvector, DynamoDB, Redis, RabbitMQ).

- Deploy and maintain Kubernetes clusters for scaling, health checks, and rolling updates.

- Build and maintain Docker images for Django, Daphne, Redis, RabbitMQ, Celery, and supporting services.

- Oversee ingress routing via Cloudflare - NGINX - Application EC2.

Automation & CI/CD :

- Implement and optimize CI/CD pipelines for rapid and reliable deployments.

- Automate environment provisioning using Terraform or equivalent IaC tools.

- Maintain consistent build and deployment practices across staging, testing, and production.

Monitoring & Observability :

- Configure and maintain Grafana dashboards (grafana.holome.ai) with Prometheus and Redis/Postgres exporters.

- Define and refine alerts for CPU/memory, Redis usage, RabbitMQ queue health, Celery task backlog, and 5xx error rates.

- Provide observability tools and insights to engineering and QA teams.

Security & Compliance :

- Maintain Cloudflare WAF, TLS/SSL termination, and DDoS protections.

- Manage EC2 Security Groups, IAM roles, and least-privilege policies.

- Support middleware security (SQL injection detection, IP blocking, request throttling).

- Ensure compliance-readiness for GDPR/CCPA/PDPL, including logging, encryption, and data isolation

Collaboration & Support :

- Work with AI/ML engineers to optimize GPU clusters or cloud inference nodes.

- Partner with QA engineers to integrate automated load/resilience testing.

- Support operations teams with documentation, playbooks, and incident response.

Qualifications :

Must-Have Skills :

- Proven experience with AWS (EC2, RDS, DynamoDB, IAM, CloudWatch, KMS).

- Strong background with Docker & Kubernetes in production.

- CI/CD pipeline design (GitHub Actions, GitLab CI, or Jenkins).

- Infrastructure-as-Code (Terraform, CloudFormation, or Ansible).

- Monitoring and observability (Grafana, Prometheus, ELK stack).

- Linux system administration and networking expertise.

Nice-to-Have Skills :

- Experience with Redis, RabbitMQ, and Celery in production.

- Familiarity with Django/ASGI-based apps (Daphne, NGINX).

- Understanding of compliance frameworks (GDPR, PDPL, CCPA).

- Security testing, WAF tuning, and penetration testing exposure.

- Experience with AI/ML infrastructure (GPU clusters, inference optimization).

Soft Skills :

- Strong problem-solving and debugging mindset.

- Clear communicator, able to document and share knowledge effectively.

- Comfortable in cross-functional teams spanning product, AI, backend, and QA.

- Ownership mentality with a focus on uptime, reliability, and resilience.

What We Offer :

- Opportunity to work on cutting-edge holographic avatar technology.

- Collaborative and innovative environment with cross-functional exposure.

- Flexible work arrangements in Kuala Lumpur HQ with hybrid options.

- Growth opportunities in AI, cloud, and real-time rendering domains.

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in