- Design and implement LLM and conversational AI validation frameworks end-to-end

- Perform hallucination detection and response accuracy scoring for generative AI outputs

- Validate RAG (Retrieval-Augmented Generation) pipelines and knowledge grounding

- Conduct synthetic call testing and AI load/stress benchmarking

- Execute prompt robustness testing, edge case identification, and fallback validation

- Perform sensitive data leakage audits and AI compliance testing (GDPR, HIPAA awareness)

- Define and track AI evaluation metrics (BLEU, ROUGE, semantic similarity, latency SLAs)

- Collaborate with ML engineers and product teams to integrate QA gates into CI/CD pipelines

- Drive responsible AI practices including bias detection, fairness testing, and safety red-teaming

- Document test strategies, findings, and quality reports for stakeholders

Required Skills & Experience :

- 5+years of QA experience with at least 1-2 years in LLM or conversational AI testing

- Hands-on experience with AI evaluation metrics and frameworks (e.g., LangSmith, Ragas, DeepEval, or similar)

- Strong understanding of AI safety principles, bias validation, and responsible AI practices

- Experience designing and executing AI/ML testing frameworks from scratch

- Proficiency in Python for test automation and scripting

- Familiarity with tools such as Postman, pytest, Selenium, or equivalent

- Understanding of NLP concepts, vector databases, and RAG architecture

- Experience testing voice bots or chatbots (Dialogflow, Amazon Lex, Rasa, or similar platforms)

- Knowledge of REST APIs and integration testing