Posted on: 17/01/2026
A Platform Engineer designs, builds, and operates the Internal Developer Platform (IDP)-the integrated product that gives developers self-service workflows and "golden paths" for the full software lifecycle. The goal is to reduce cognitive load, standardize operations behind clear abstractions, and enable fast, secure, reliable delivery across teams and environments.
Core Responsibilities (What They Do) :
- Productize the platform : Treat internal teams as customers; define platform value propositions, roadmaps, SLAs, and feedback loops ("platform as a product").
- Build self-service golden paths : Provide templates, orchestration, and paved roads for common tasks (deploy, provision, observe, roll back) that hide complexity without removing flexibility.
- Integrate the toolchain into an IDP : Compose CI/CD, runtime, IaC, secrets, policy, observability, and environment automation into a cohesive developer experience.
- Engineer secure-by-default guardrails : Bake identity, policy, and runtime controls into the platform so "the safe path is the easy path." (Often in partnership with security platform engineers.)
- Operate for reliability & cost : Own platform SLOs, performance, and FinOps hygiene; continuously optimize speed, stability, and spend.
- Accelerate adoption & education : Drive onboarding, docs, workshops, and change management-since IDP failures are commonly education and adoption problems, not tech.
Must-have Experience:
- 7 to 10+ years in Platform Engineering / DevOps / SRE roles, with at least 3-4 years building and operating an Internal Developer Platform (IDP) or equivalent paved-road/golden-path capabilities at scale.
- Proven track record designing abstractions that reduce cognitive load for developers (templates, CLIs, portals, APIs) and measurably improving lead time, deploy frequency, and change failure rate.
- Experience leading complex, multi-team initiatives end-to-end (architecture - implementation - adoption) and mentoring engineers.
Core Technical Depth :
IDP & Runtime Foundations :
- Expert with Kubernetes (multi-tenant clusters, namespaces, quotas, autoscaling, admission controllers), containerization (Docker/OCI), and artifact registries.
- Comfortable with GPU-aware environments (NVIDIA GPU Operator, device plugins, CUDA drivers, scheduling/quotas).
Infrastructure as Code & Environment Automation :
- Strong hands-on with Terraform (modules, workspaces), plus Helm/Kustomize and at least one config management tool (e.g., Ansible).
- Patterns for multi-env promotion (dev - test - prod), ephemeral environments, and drift detection.
CI/CD & Software Delivery :
- Designed golden pipelines using GitHub Actions / GitLab CI / Jenkins / Argo CD, including policy checks, test gates, SBOMs, and progressive delivery (blue/green, canary, rollbacks).
Observability & Reliability :
- Production experience with Prometheus/Grafana, OpenTelemetry, and centralized logging (ELK/EFK/Cloud Logging).
- SLO/SLA design, error budgets, incident response/runbooks, capacity & performance engineering.
Workflow Orchestration & Data/ML Enablement :
- Built or operated workflows with Airflow/Argo/Dagster; understands ML lifecycle glue (model registry, experiment tracking, feature store) and reproducible training/eval.
- Familiar with tools like MLflow, Vertex AI (or equivalents), and data versioning patterns (DVC/lakeFS).
Storage, Networking, and Hybrid Topologies :
- Practical knowledge of object storage (S3-compatible/GCS), POSIX/NFS for heavy data, caching layers, and egress/ingress patterns in hybrid/on-prem + cloud.
- Solid grasp of service networking (ingress controllers, service mesh fundamentals, DNS, TLS, L4/L7).
Security Integration (Consume, Don't Own) :
- Integrates with enterprise Identity & Access (OIDC/OAuth2, groups/ABAC), secrets services, and policy stacks without directly owning them.
- Builds secure-by-default golden paths (least privilege, isolation, audit hooks) aligned to central governance.
FinOps & Cost Engineering :
- Experience with cost visibility/attribution (labels/tags), GPU/compute optimization, storage tiering, and scale-testing to balance performance vs. spend.
Platform-as-Product & Developer Experience :
- Treats internal teams as customers : discovery, journey mapping, backlog/roadmap, and release notes.
- Strong documentation habits (design docs, quickstarts, runbooks) and enablement (workshops, office hours, migration playbooks).
- Comfortable making opinionated defaults while preserving escape hatches.
Nice-to-Have :
- Computer vision / image & video workflows; dataset curation tools (e.g., Voxel51/FiftyOne), large-scale image storage and metadata strategies.
- GCP (GKE, GCS, Cloud Build/Deploy, Pub/Sub), Anthos/hybrid fleet management; familiarity with AWS helpful.
- Data platform exposure (e.g., Databricks, Delta/Lakehouse concepts) and event streaming (Kafka/Pub/Sub).
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
ML / DL / AI Research
Job Code
1602638