Posted on: 10/12/2025
Job Description :
What you'll do :
- Own the cloud, today and tomorrow : Design, build, and operate production infrastructure on AWS (EC2- first) while planning for multi- cloud portability (networking models, IaC abstractions, artifact strategy, secrets, identity).
- GPU platform operations : Stand up and operate GPU fleets for STT/TTS models (AMI images/containers, drivers/CUDA, capacity planning, autoscaling, utilization dashboards, cost controls such as on- demand vs. spot where safe).
- Deploy at speed : Evolve CI/CD (currently Jenkins) for safe, fast, and repeatable releases; enable blue/green and canary patterns; enforce environment parity and automated rollbacks.
- Design for scale & cost : Architect capacity for millions of daily conversations; implement autoscaling, caching, and cost controls (Savings Plans/RIs, Graviton, storage lifecycle). Track unit economics (e.g., $/successful conversation, $/GPU hour).
- Observability & operations : Standardize Prometheus + Grafana and central logging; define SLIs/SLOs; run incident response/on- call with blameless postmortems and runbooks. Extend metrics to GPU health (thermals, ECC, driver) and queue back- pressure.
- Security & compliance (cloud + AI) : Embed security (least- privilege IAM, KMS, VPC segmentation, secret management, image hardening), drive patching/vuln mgmt, and own SOC 2 / ISO 27001 cadence.
- Establish AI compliance practices (model/data governance, retention, dataset access controls, inference isolation) aligned to customer/regulatory needs.
- Single- tenant excellence : Productize per- tenant stacks (templated IaC, parameterized configs, release rings) for repeatability and isolation; ensure data residency and customer- specific controls.
- Manage customer- cloud deployments with secure access patterns and clear SLOs.
- Team leadership & growth : Coach and unblock three junior DevOps engineers; set standards, review designs, and hire for scale.
- Help shape the function into pods over time: Security, R&D/Tooling, FinOps, and Core Ops.
- Sales partner : Support RFPs/presales with cloud architecture, BoQs, security responses, and workshops; communicate trade- offs clearly to enterprise architects and CISOs.
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1588120
Interview Questions for you
View All