HamburgerMenu
hirist

Job Description

The Opportunity :

We are seeking a Platform Specialist (Director level) to serve as the organization's top technical authority on Kubernetes and the most senior hands-on engineer for CK-Kube, our Kubernetes Cost Intelligence platform. This is a deep individual contributor role 60%+ hands-on engineering where you will architect, implement, and technically lead CK-Kube as the principal engineer. You will set the technical direction, write production code, and drive architectural decisions. We are not looking for a people manager; we are looking for the strongest Kubernetes systems engineer we can find.

What You'll Own :

CK-Tuner-Kubernetes Kubernetes Cost Intelligence Platform :

- Architect and implement the cost allocation engine cluster, namespace, deployment, pod, and container granularity across EKS, AKS, and GKE

- Design and build the real-time data collection pipeline : agent architecture, ClickHouse time-series storage, gRPC streaming between agent and datastore

- Implement Karpenter integration for node lifecycle management and bin-packing optimization

- Build custom Kubernetes controllers and operators for cost policy enforcement and automated remediation

- Design shared cost distribution algorithms system namespaces, control plane costs, networking overhead, idle capacity attribution

- Integrate CK-Tuner-Kubernetes with CK-Lens for a unified cloud + container cost view

Container Optimization Engine :

- Design and implement container right-sizing algorithms for CPU and memory requests/limits based on real usage patterns

- Build node pool optimization logic instance type selection, scaling policies, bin-packing efficiency scoring

- Implement Karpenter-based spot and preemptible node policies for fault-tolerant workloads

- Build the automated right-sizing execution pipeline via CK-Tuner integration

GPU Container Cost Intelligence :

- GPU utilization tracking and idle GPU detection for AI/ML workloads running on Kubernetes

- Multi-cluster GPU cost comparison across EKS, AKS, and GKE

- Integration with the FinOps for AI initiative for GPU pod-level cost attribution

Responsibilities :

Technical Leadership :

- Serve as CK-Tuner-Kubernetes's principal architect and most senior hands-on engineer

- Set architectural standards and code quality bars; mentor engineers through technical pairing and design reviews

- Drive technical roadmap and architecture decisions in partnership with Product Management

Hands-On Engineering :

- Write production Go code for CK-Tuner-Kubernetes's core systems : agent data collection, metrics processing, cost allocation engine

- Design and implement custom Kubernetes controllers and operators

- Build and optimize the ClickHouse time-series data model for cost metrics at scale

- Implement gRPC streaming with backpressure, circuit breakers, and mTLS between agent and datastore

- Develop Karpenter-based node optimization policies and consolidation algorithms

- Performance-tune the metrics pipeline : 10-second scrape intervals, 1-minute rollups, multi-cluster aggregation

Technical Strategy :

- Design the agent data collection layer hybrid metrics collection via Metrics API, Kubelet Summary, Kubelet Proxy, and optional Prometheus endpoints

- Architect the ClickHouse time-series schema with materialized views for multi-resolution aggregation (5m, 1h, 1d)

- Build the delta processing pipeline in-memory state comparison with ring buffers (discovery 10K, metrics 50K, events 100K)

- Design cost allocation algorithms for shared resources control plane, networking, system namespaces, idle capacity

- Architect multi-cloud Kubernetes support (EKS primary, AKS/GKE Phase 4) with provider-specific pricing API integrations

- Build integration points with CK-Lens, CK-Tuner, and CK-Intelligence

Technical Landscape You'll Navigate :

Kubernetes & Container Orchestration :

- Platforms : EKS (Fargate, managed node groups), AKS, GKE (Autopilot, standard), on-prem Kubernetes

- Ecosystem : OpenCost, Karpenter, Helm, Kubernetes Operators, K8s API Server

- Resource Management : Requests/limits, node autoscaling, pod scheduling, bin-packing, spot/preemptible nodes

- Kubernetes Internals : Custom controllers, operators, CRDs, admission webhooks, scheduler plugins, informers, leader election, reconciliation loops

Data Engineering :

- ClickHouse (time-series analytics), Apache Pulsar/NATS JetStream (message broker), gRPC bidirectional streaming with backpressure

Cloud Providers :

- AWS : EKS, Fargate, EC2 (GPU instances), S3, CloudWatch, Cost & Usage Reports

- Azure : AKS, Azure Monitor, Azure Billing APIs

- GCP : GKE, GKE Autopilot, BigQuery Billing Export

Role Requirements

Experience :

- 10+ years in systems/platform/infrastructure engineering with deep hands-on Kubernetes production experience (EKS, AKS, or GKE)

- Track record of personally designing and implementing complex distributed systems not just overseeing teams that build them

- Experience building Kubernetes tooling : operators, controllers, CLI tools, or platform products

- Prior work on cost/resource optimization, observability, or infrastructure intelligence platforms preferred

- Experience with container orchestration at scale multi-cluster, multi-cloud preferred

Technical Depth :

- Expert-level : Kubernetes internals (scheduler, controller-manager, kubelet, API server), resource management, pod lifecycle

- Hands-on : Custom controller/operator development using controller-runtime or client-go

- Production experience with Karpenter, OpenCost, or equivalent node/cost optimization tools

- Strong Go proficiency (CK-Kube is 100% Go); experience with gRPC, Protocol Buffers

- ClickHouse or similar OLAP/time-series database experience for high-throughput metrics

- eBPF, CNI, or CSI plugin development experience is a strong plus

Leadership :

- Ability to operate in a "founding engineer" mode small team, high ownership, rapid shipping

- Track record of setting technical direction and architectural standards that scale beyond your own code

- Comfortable wearing multiple hats : architecture, implementation, code review, technical documentation, product input

- Influence through technical excellence, design documents, and working code not through organizational authority

- Strong communicator who can influence across functions and levels


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in