Posted on: 24/11/2025
Description :
We are seeking a seasoned Engineering Lead with deep expertise in workflow orchestration systems, stateful execution engines, and distributed task runtimes. You will architect the next generation of our DAG-based runtime, develop the infrastructure for agentic workflow composition, and drive execution excellence across engineering. This role combines hands-on systems design with technical leadership and team mentorship.
Key Responsibilities :
Architect & Build the Swarm Runtime :
- Design and implement a DAG-based orchestration engine using Temporal, Argo Workflows, or equivalent event-driven runtimes.
- Build a scalable primitive registry for tasks, operators, guards, and computational nodes.
- Architect a robust scheduler capable of handling event triggers, retries, backoffs, and distributed coordination.
Develop the Workflow Composition Layer :
- Define and build a YAML/JSON-based DSL for describing agentic workflows, dependencies, and execution semantics.
- Create a schema-driven rules engine ensuring validations, model calls, parallelism, conditional branching, and approval gates are seamlessly integrated.
Orchestration Logic & Runtime Intelligence :
- Implement orchestration logic that coordinates :
1. Validation layers
2. Model calls (LLMs, embedding engines, external APIs)
3. Human-in-the-loop approval gates
4. Stateful transitions and checkpointing
- Ensure deterministic execution, traceability, and safe rollback mechanisms.
Workflow Certification & Automated Testing :
- Define certification standards for every workflow type, including :
1. Functional correctness
2. Latency and concurrency thresholds
3. Error-handling expectations
4. Observability and trace coverage
- Build automated regression test suites validating workflow integrity before deployment.
Engineering Leadership & Delivery :
- Lead, mentor, and grow a team of backend and systems engineers.
- Drive sprint planning, reviews, engineering discipline, and roadmap execution.
- Own runtime delivery deadlines, cross-team coordination, and release quality.
Observability, Reliability & Production Excellence :
- Instrument the runtime with observability hooks : metrics, tracing, structured logs, and execution heatmaps.
- Build robust retry logic, distributed locks, idempotency guards, and failover strategies.
- Improve runtime stability, throughput, and scale characteristics.
Requirements :
Must-Have :
- 8+ years of backend engineering experience building high-scale systems.
- 3+ years leading teams focused on workflows, automation, orchestration, or distributed runtimes.
- Deep understanding of :
1. Stateful orchestration engines (Temporal, Step Functions, Argo, Airflow)
2. Message queues, pub/sub systems, and event-driven patterns
3. Retry logic, compensating transactions, and idempotent operations
4. Distributed tracing, observability pipelines, and health checks
- Strong background in concurrent programming, async task management, and execution models.
- Hands-on experience with at least one systems language or backend stack (Python, Go, Rust, Node).
Nice-to-Have :
- Experience building workflow DSLs or schema-driven interpreters.
- Familiarity with LLM pipelines, agentic runtimes, or AI-driven workflow automation.
- Knowledge of Kubernetes-based runtime environments and workflow controllers.
- Experience with pluggable architecture design, sandboxing, or execution policies.
What This Role Offers :
- Ownership of the core execution engine powering Perceive Nows intelligent automation platform.
- A high-impact leadership position shaping architectural strategy and engineering culture.
- The opportunity to solve cutting-edge problems at the intersection of orchestration, distributed systems, and AI.
Did you find something suspicious?
Posted By
Posted in
Backend Development
Functional Area
Engineering Management
Job Code
1579758
Interview Questions for you
View All