- Lead validation, quality engineering, and production reliability for agentic AI systems.

- Ensure LLM-powered features, tool-using agents, and automated workflows are reliable, safe, observable, and scalable in production.

- Partner closely with ML engineering and platform teams to operationalize agentic AI systems with strong quality and safety standards.

Key Result Areas (KRAs) :

- Own end-to-end validation strategy for agentic AI and LLM-powered systems.

- Design and execute test strategies for multi-agent workflows and long-running AI processes.

- Ensure reliability and correctness of tool execution, retries, and failure recovery.

- Establish quality benchmarks for AI behavior consistency and regression prevention.

- Define and track AI quality, safety, and reliability metrics in production.

- Integrate AI validation frameworks into CI/CD pipelines with strong quality gates.

- Monitor live AI systems, investigate incidents, and drive root-cause remediation.

- Standardize agentic AI testing practices and mentor engineering teams.

- Validate safety guardrails and responsible AI practices across agentic systems.

Roles & Responsibilities :

- Design automated test harnesses for agentic workflows and LLM-driven features.

- Build golden datasets and regression suites for AI behavior validation.

- Handle LLM non-determinism using mocking, stubbing, replay, and controlled inference techniques.

- Test agent memory, context management, grounding, and state transitions.

- Validate tool and API orchestration across agent-based workflows.

- Collaborate with ML, platform, and product teams to improve AI reliability.

- Support production readiness, observability, and incident response for AI systems.

Required Skillsets :

- Strong experience in applied ML validation, SDET, QA automation, or ML engineering roles.

- Hands-on experience testing LLM-powered or agentic AI systems in production.

- Strong understanding of AI testing challenges including non-determinism, hallucinations, and model drift.

- Proficiency in Python, Java, or TypeScript/JavaScript.

- Experience with API testing, integration testing, and automation frameworks.

- Hands-on experience integrating validation into CI/CD pipelines.

- Familiarity with observability, logging, and debugging distributed systems.

- Strong analytical, problem-solving, and communication skills.

Preferred Skillsets :

- Experience with agent frameworks and orchestration systems.

- Exposure to RAG pipelines and vector databases.

- Experience in AI security testing including prompt injection and jailbreak prevention.

- Experience building AI evaluation, monitoring, or quality dashboards.