Posted on: 05/02/2026
Description :
- Lead validation, quality engineering, and production reliability for agentic AI systems.
- Ensure LLM-powered features, tool-using agents, and automated workflows are reliable, safe, observable, and scalable in production.
- Partner closely with ML engineering and platform teams to operationalize agentic AI systems with strong quality and safety standards.
Key Result Areas (KRAs) :
- Own end-to-end validation strategy for agentic AI and LLM-powered systems.
- Design and execute test strategies for multi-agent workflows and long-running AI processes.
- Ensure reliability and correctness of tool execution, retries, and failure recovery.
- Establish quality benchmarks for AI behavior consistency and regression prevention.
- Define and track AI quality, safety, and reliability metrics in production.
- Integrate AI validation frameworks into CI/CD pipelines with strong quality gates.
- Monitor live AI systems, investigate incidents, and drive root-cause remediation.
- Standardize agentic AI testing practices and mentor engineering teams.
- Validate safety guardrails and responsible AI practices across agentic systems.
Roles & Responsibilities :
- Design automated test harnesses for agentic workflows and LLM-driven features.
- Build golden datasets and regression suites for AI behavior validation.
- Handle LLM non-determinism using mocking, stubbing, replay, and controlled inference techniques.
- Test agent memory, context management, grounding, and state transitions.
- Validate tool and API orchestration across agent-based workflows.
- Collaborate with ML, platform, and product teams to improve AI reliability.
- Support production readiness, observability, and incident response for AI systems.
Required Skillsets :
- Strong experience in applied ML validation, SDET, QA automation, or ML engineering roles.
- Hands-on experience testing LLM-powered or agentic AI systems in production.
- Strong understanding of AI testing challenges including non-determinism, hallucinations, and model drift.
- Proficiency in Python, Java, or TypeScript/JavaScript.
- Experience with API testing, integration testing, and automation frameworks.
- Hands-on experience integrating validation into CI/CD pipelines.
- Familiarity with observability, logging, and debugging distributed systems.
- Strong analytical, problem-solving, and communication skills.
Preferred Skillsets :
- Experience with agent frameworks and orchestration systems.
- Exposure to RAG pipelines and vector databases.
- Experience in AI security testing including prompt injection and jailbreak prevention.
- Experience building AI evaluation, monitoring, or quality dashboards.
Did you find something suspicious?