Posted on: 12/11/2025
Description :
About the job
Position Overview :
We are seeking an experienced Data Scientists and Agentic AI Developers to design, develop and evaluate intelligent AI agents that deliver high-quality, reliable, and compliant solutions.
This role requires a blend of data science expertise, AI/ML engineering capabilities, and hands-on experience building agentic AI systems.
Location and Type :
Bangalore (Whitefield) with hybrid (3 days/week)
Timings : 2-11 pm
Experience Required : 2-10+ years of professional experience in data science, machine learning, and AI development
Key Responsibilities :
AI Agent Development & Deployment :
- Design and develop agentic AI systems powered by Large Language Models (LLMs) with tool-calling capabilities
- Build and optimize multi-step reasoning workflows and agent orchestration frameworks
- Implement retrieval-augmented generation (RAG) pipelines for knowledge-intensive applications
- Integrate external tools, APIs, and databases into agent workflows
- Deploy and monitor production-grade AI agents at scale
Evaluation & Quality Assurance :
- Develop comprehensive evaluation frameworks using LLM-as-a-judge methodologies
- Implement automated scoring systems for output quality metrics (correctness, helpfulness, coherence, relevance)
- Design and execute robustness testing including adversarial attack scenarios
- Monitor and reduce hallucination rates and ensure factual accuracy
- Track performance metrics including latency, throughput, and cost-per-interaction
Data Science & Analytics :
- Analyze agent performance data to identify improvement opportunities
- Build custom evaluation pipelines and scoring rubrics
- Conduct A/B testing and statistical analysis for model optimization
- Create dashboards and visualization tools for stakeholder reporting
- Implement RAGAs (Retrieval Augmented Generation Assessment) frameworks
Safety, Ethics & Compliance :
- Ensure AI systems meet ethical standards including bias detection and fairness
- Implement safety guardrails to prevent harmful content generation
- Develop compliance monitoring systems for regulatory frameworks (EU AI Act, GDPR, HIPAA, DPDP)
- Document transparency and explainability measures
- Establish human oversight protocols
Required Skills & Qualifications :
Technical Expertise :
- Programming : Strong proficiency in Python; experience with AI/ML frameworks (LangChain, LangSmith, Phoenix, or similar)
- LLM Expertise : Hands-on experience with GPT, Claude, or other frontier models; prompt engineering and fine-tuning
- Machine Learning : Deep understanding of NLP, deep learning architectures, and model evaluation
- Tools & Platforms : Experience with MLOps tools, vector databases, and observability platforms
- Data Engineering : Proficiency in SQL, data pipelines, and ETL processes
Domain Knowledge :
- Understanding of agentic AI architectures and autonomous systems
- Knowledge of RAG systems and information retrieval techniques
- Familiarity with LLM evaluation methodologies and benchmarks
- Experience with conversational AI and dialogue systems
- Understanding of AI safety, alignment, and interpretability
Evaluation & Metrics :
- Experience designing evaluation rubrics and scoring systems
- Proficiency with automated evaluation frameworks (RAGAs, custom evaluators)
- Understanding of quality metrics : coherence, fluency, factual accuracy, hallucination detection
- Knowledge of performance metrics : latency optimization, token usage, throughput analysis
- Experience with user experience metrics (CSAT, NPS, turn count analysis)
Soft Skills :
- Strong analytical and problem-solving abilities
- Excellent communication skills for cross-functional collaboration
- Ability to balance innovation with practical constraints
- Detail-oriented with a focus on quality and reliability
- Self-driven with ability to work in fast-paced environments
Preferred Qualifications :
- Experience with compliance frameworks and regulatory requirements
- Background in conversational AI or chatbot development
- Knowledge of reinforcement learning from human feedback (RLHF)
- Experience with multi-modal AI systems
Tools & Technologies :
- Frameworks : LangChain, LangSmith, Phoenix, TensorFlow, PyTorch
- LLM Platforms : OpenAI API, Anthropic Claude, Azure OpenAI
- Databases : Vector databases (Pinecone, Weaviate, Chroma), PostgreSQL, MongoDB
- Monitoring : LangSmith, Phoenix, custom observability tools
- Cloud : AWS/Azure/GCP experience preferred
- Version Control : Git, CI/CD pipelines
What you'll Deliver :
- Production-ready agentic AI systems with measurable quality improvements
- Comprehensive evaluation frameworks with automated scoring
- Performance dashboards and reporting systems
- Documentation for technical specifications and compliance standards
- Continuous improvement strategies based on data-driven insights
Did you find something suspicious?