Description :

Who We Are.

Mindtickle is the market-leading revenue productivity platform that combines on-the-job learning and deal execution to get more revenue per rep.

Mindtickle is recognized as a market leader by top industry analysts and is ranked by G2 as the #1 sales onboarding and training product.

Were honoured to be recognized as a Leader in the first-ever Forrester Wave: Revenue Enablement Platforms, Q3 2024!.

Whats in it for you ?

- Own the end-to-end qualification lifecycle for AI/LLM systems from ideation and implementation to CI/CD integration.

- Design and implement scalable automated test suites across unit, integration, regression, and system levels.

- Build and enhance frameworks to test, evaluate, and continuously improve complex AI and LLM workflows.

- Lead the design and automation of LLM-powered features, including prompt pipelines, RAG workflows, and AI-assisted developer tools.

- Develop evaluation pipelines to measure factual accuracy, hallucination rates, bias, robustness, and overall model reliability.

- Define and enforce metrics-driven quality gates and experiment tracking workflows to ensure consistent, data-informed releases.

- Collaborate with agile engineering teams, participating in design discussions, code reviews, and architecture decisions to drive testability and prevent defects early (shift left).

- Develop monitoring and alerting systems to track LLM production quality, safety, and performance in real time.

- Conduct robustness, safety, and adversarial testing to validate AI behavior under edge cases and stress scenarios.

- Continuously improve frameworks, tools, and processes for LLM reliability, safety, and reproducibility.

- Mentor junior engineers in AI testing, automation, and quality best practices.

- Measure and improve Developer Experience (DevEx) through tools, feedback loops, and automation.

- Champion quality engineering practices across the organization, ensuring delivery meets business goals, user experience, cost of operations etc.

Wed love to hear from you, if you :

- LLM testing & evaluation tools: MaximAI, OpenAI Evals, TruLens, Promptfoo, LangSmith.

- Building LLM-powered apps: prompt pipelines, embeddings, RAG, AI workflows.

- CI/CD design for application + LLM testing.

- API, performance, and system testing.

- Git, Docker, and cloud platforms (AWS / GCP / Azure).

- Bias, fairness, hallucination detection & AI safety testing.

- Mentorship and cross-functional leadership.

Preferred Qualifications :

- Bachelors or Masters in Computer Science, Engineering, or equivalent.

- 4+ years in software development, SDET, or QA automation.

- Proficiency in GoLang, Java, or Python.

- Proven experience building test automation frameworks.

- Proven ability to design CI/CD pipelines with automated regression and evaluation testing.

- Hands-on exposure to LLMs, GenAI applications.

- 2+ years of hands-on experience with LLM APIs and frameworks (OpenAI, Anthropic, Hugging Face).

- Proficient in prompt engineering, embeddings, RAG, and LLM evaluation metrics.

- Strong analytical, leadership, and teamwork skills.

- Excellent communication and collaboration across teams.