HamburgerMenu
hirist

DevOps/MLOps Engineer - Cloud Infrastructure

SutraHR
Anywhere in India/Multiple Locations
3 - 10 Years

Posted on: 29/01/2026

Job Description

Role Overview :

The DevOps/MLOps Engineer is responsible for building and maintaining the secure, scalable cloud infrastructure and the continuous deployment pipelines for the entire Hierarchical Task Planning (HTP) agent framework. This role is critical for managing GPU resources, automating the deployment of LLMs/SLMs, and ensuring the auditability and security of the financial data handled by the Strategic Agent.

Key Responsibilities :

1. Cloud Infrastructure and Security


- Infrastructure as Code (IaC): Design and deploy robust, multi-region cloud infrastructure (AWS, GCP, or Azure) using tools like Terraform or CloudFormation.

- Security Posture: Implement and manage security protocols, including VPC configuration, firewall rules, access control (IAM/RBAC), and ensuring data encryption at rest and in transit (critical for financial data).

- Compute Management: Manage and optimize GPU clusters and compute resources for high-performance model training (SLM fine-tuning) and low-latency inference (agent decision-making).

2. MLOps and Deployment Automation

- CI/CD for Agents: Establish a standardized CI/CD pipeline for the HTP framework, covering the Strategic (PAEP), Tactical (ReAct), and Worker Agents.


- Model Lifecycle Management: Implement MLOps tools (e.g., MLflow, SageMaker, or Vertex AI) for experiment tracking, model versioning, and automated rollout/rollback capabilities for the LLMs and SLMs.


- Containerization: Manage deployment via container technologies (Docker and Kubernetes/ECS), ensuring efficient resource allocation and environment consistency across development and production.

3. Monitoring, Auditing, and Data Flow


- Monitoring: Implement comprehensive logging and monitoring systems (e.g., Prometheus, Grafana) to track agent performance, model drift, and infrastructure health.

- Data Pipeline Integration: Collaborate with the Data Engineer to integrate the MLOps pipeline with the ETL/ELT pipelines, ensuring secure and automated data transfer to and from the Causal Graph.

- Auditability: Ensure all agent actions and decisions are logged and auditable, fulfilling the fiduciary requirements of the private equity firm.

Required Skills and Qualifications :

Essential Technical Expertise :

- Cloud Expertise: 3+ years of experience managing production infrastructure on a major cloud provider (AWS, GCP, or Azure).

- MLOps Tooling: Hands-on experience with MLOps platforms (MLflow, Kubeflow) and version control (Git).

- Containerization & Orchestration: Strong command of Docker and Kubernetes or similar container orchestration systems.

- IaC and CI/CD: Proficiency with Terraform and building automated CI/CD pipelines (e.g., Jenkins, GitHub Actions).


- Linux/Scripting: Strong scripting skills (Bash, Python) for automation.

Desirable Domain Experience :

- High-Performance Compute: Experience provisioning and managing high-demand compute resources (GPU instances).

- Security Focus: Specific experience hardening cloud environments for financial or sensitive data applications.

- Financial Systems: Experience working with financial systems is a strong plus.

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in