Posted on: 25/11/2025
Description :
We are seeking a seasoned Engineering Lead with deep expertise in workflow orchestration systems, stateful execution engines, and distributed task runtimes. You will architect the next generation of our DAG-based runtime, develop the infrastructure for agentic workflow composition, and drive execution excellence across engineering. This role combines hands-on systems design with technical leadership and team mentorship.
Key Responsibilities :
Architect & Build the Swarm Runtime :
- Design and implement a DAG-based orchestration engine using Temporal, Argo Workflows, or equivalent event-driven runtimes.
- Build a scalable primitive registry for tasks, operators, guards, and computational nodes.
- Architect a robust scheduler capable of handling event triggers, retries, backoffs, and distributed coordination.
Develop the Workflow Composition Layer :
- Define and build a YAML/JSON-based DSL for describing agentic workflows, dependencies, and execution semantics.
- Create a schema-driven rules engine ensuring validations, model calls, parallelism, conditional branching, and approval gates are seamlessly integrated.
Orchestration Logic & Runtime Intelligence :
- Implement orchestration logic that coordinates :
1. Validation layers
2. Model calls (LLMs, embedding engines, external APIs)
3. Human-in-the-loop approval gates
4. Stateful transitions and checkpointing
- Ensure deterministic execution, traceability, and safe rollback mechanisms.
Workflow Certification & Automated Testing :
- Define certification standards for every workflow type, including :
1. Functional correctness
2. Latency and concurrency thresholds
3. Error-handling expectations
4. Observability and trace coverage
- Build automated regression test suites validating workflow integrity before deployment.
Engineering Leadership & Delivery :
- Lead, mentor, and grow a team of backend and systems engineers.
- Drive sprint planning, reviews, engineering discipline, and roadmap execution.
- Own runtime delivery deadlines, cross-team coordination, and release quality.
Observability, Reliability & Production Excellence :
- Instrument the runtime with observability hooks : metrics, tracing, structured logs, and execution heatmaps.
- Build robust retry logic, distributed locks, idempotency guards, and failover strategies.
- Improve runtime stability, throughput, and scale characteristics.
Requirements :
Must-Have :
- 8+ years of backend engineering experience building high-scale systems.
- 3+ years leading teams focused on workflows, automation, orchestration, or distributed runtimes.
- Deep understanding of :
1. Stateful orchestration engines (Temporal, Step Functions, Argo, Airflow)
2. Message queues, pub/sub systems, and event-driven patterns
3. Retry logic, compensating transactions, and idempotent operations
4. Distributed tracing, observability pipelines, and health checks
- Strong background in concurrent programming, async task management, and execution models.
- Hands-on experience with at least one systems language or backend stack (Python, Go, Rust, Node).
Nice-to-Have :
- Experience building workflow DSLs or schema-driven interpreters.
- Familiarity with LLM pipelines, agentic runtimes, or AI-driven workflow automation.
- Knowledge of Kubernetes-based runtime environments and workflow controllers.
- Experience with pluggable architecture design, sandboxing, or execution policies.
What This Role Offers :
- Ownership of the core execution engine powering Perceive Nows intelligent automation platform.
- A high-impact leadership position shaping architectural strategy and engineering culture.
- The opportunity to solve cutting-edge problems at the intersection of orchestration, distributed systems, and AI.
Did you find something suspicious?
Posted by
Posted in
Backend Development
Functional Area
Engineering Management
Job Code
1579758