- Advanced inference & prompt optimization - develop and optimize prompt chains and pipelines using frameworks like DSPy and GEPA to programmatically manage and tune reasoning steps, moving beyond brittle, hand-crafted prompts.

- Implement and experiment with techniques such as Chain-of-Thought, Tree-of-Thought, and Graph-of-Thoughts to enhance reasoning capabilities.

- Create a rigorous evaluation system using LLM-as-a-judge and scenario-based testing to measure accuracy, robustness, and reasoning quality.

- Architect memory management, retrieval, and orchestration strategies to support multi-agent workflows with human-in-the-loop review.

- Instrument our AI systems for complete traceability and observability, logging all agent actions, tool calls, and intermediate reasoning steps for debugging, audit, and compliance.

- Develop ETL pipelines and data engineering workflows to handle structured, unstructured, vector, and graph data.

- Build dashboards to track key metrics : cost, latency, correctness, and concept drift.

- Maintain AI services on cloud environments (AWS, Azure) and integrate them into broader DevOps pipelines.

Qualifications :

- 5+ years of commercial development experience in Python or JS.

- Demonstrated experience building and deploying production-level Agentic AI or complex reasoning systems.

- Deep expertise in the modern LLM Ops stack : You have hands-on experience with frameworks such as LangChain and evaluation tools (Langfuse, W&B, Helicone).

- Strong background in data engineering : ETL processes, SQL/NoSQL databases, vector databases, and graph data models.

- Deep understanding of AI agent architectures : prompt engineering, RAG, memory, HITL, tool integration, and multi-agent control (MCP).

- Proficiency with cloud platforms (AWS, Azure) and modern DevOps practices (CI/CD, containerization, infrastructure as code).

Nice-to-have :

- Direct experience with RLHF/RLAIF pipelines or model fine-tuning (LoRA, QLoRA).

- Experience with graph data models