HamburgerMenu
hirist

Cloud Infrastructure Product Architect

White Code Labs
5 - 8 Years
Multiple Locations

Posted on: 27/04/2026

Job Description

Description :



Title : Product Architect-(Cloud Infrastructure Intelligence)



About the Product :



Were building a distributed engine designed to remediate cloud waste at scale. This isn't a monitoring tool; its an intelligence layer that executes risk-aware decisions across complex enterprise environments.



The Work :



This is a high-leverage, part-time role focused on Product Strategy rather than daily implementation. You will define the rules of engagement for our AI.



Key Focus Areas :



- Detection Models : Define how we score confidence and reduce false positives across CPU, network, and disk metrics.



- Risk Modeling : Categorize environment tiers (Dev vs. Prod) and set the automation boundaries for each.



- System Behavioral Logic : Set the evaluation windows and telemetry requirements needed for the system to act accurately.



- Edge Case Engineering : Architecting for "noise" like batch processing, shared storage (EFS), and scaling groups.



Required Background :



- AWS Mastery : Deep knowledge of EC2, IAM, CloudWatch, and Organizations.



- Systems Design : Experience building automation workflows or complex cloud architectures.



- The "FinOps" Mindset : A background in cost optimization or observability is highly preferred.



Responsibilities & Duties



1. Architecting the Core Intelligence Logic :



- Design Multi-Signal Detection Frameworks : Develop the mathematical and logical models that determine "idleness" by synthesizing data from CPU, network throughput, disk I/O, and memory.



- Establish Confidence Scoring : Create the scoring engine that weighs different signals to produce a "remediation confidence" percentage, ensuring the system only acts when data is conclusive.



2. Risk Modeling & Operational Guardrails :



- Define Environment Risk Tiers : Establish distinct "Rules of Engagement" for different environment types (Sandbox, Dev, Staging, and Production) to ensure safety.



- Set Automation Boundaries : Design the "Hard Stop" logicidentifying specific conditions where the system should pause and escalate for human approval rather than proceeding with automated remediation.



3. Strategic System Design & Telemetry :



- Specify Telemetry Requirements : Define exactly which metrics (from CloudWatch, Flow Logs, etc.) the engineering team needs to ingest to enable high-fidelity decision-making.



- Bridge Product & Engineering : Translate high-level business goals (e.g., "reduce spend by 20%") into technical logic requirements that the engineering team can build and scale.



- Advise on Multi-Account Strategy : Architect how the intelligence layer should behave across complex AWS Organizations and cross-account IAM roles.



4. Explainability & Decision Transparency :



- Design the "Evidence Chain" : Define the specific data points required to justify every decision. Ensure that when a resource is flagged, the system provides a human-readable "Why."



5. Edge Case Engineering & Validation :



- Scenario Modeling : Develop logic to handle "noisy" infrastructure, including Auto Scaling Groups (ASGs), ephemeral workloads, and EFS-backed instances that appear idle but are mission-critical.



- False Positive Reduction : Continuously analyze system outputs to identify patterns that lead to false positives (e.g., monthly cron jobs or quarterly backups) and refine the detection models accordingly.


The job is for:

May work from home
info-icon

Did you find something suspicious?

Similar jobs that you might be interested in