- You will be responsible for designing, fine-tuning, and deploying LLM-based systems for real-time use cases such as QA automation, summarization, agent assistance, and conversational insights.

- The role involves building AI agents and intelligent workflows that integrate seamlessly with CRMs, telephony systems, and ticketing platforms.

- You will work closely with product and data teams to collect, clean, and structure large volumes of chat and call data for model training, while continuously experimenting with prompt engineering, retrieval-augmented generation (RAG), and fine-tuning techniques to improve response quality and reliability.

- You will also develop evaluation pipelines and performance metrics to monitor accuracy, latency, and cost in production, and collaborate with backend engineers to deploy models at scale through optimized APIs, caching strategies, and token cost controls.

- Staying current with the evolving LLM and open-source ecosystem-including tools like LangChain, OpenAI, Anthropic, Ollama, and HuggingFace-and bringing relevant innovations into production will be a key part of the role.

- The ideal candidate brings hands-on experience with Python, PyTorch or TensorFlow, and vector databases such as Pinecone, Weaviate, or FAISS, along with a practical understanding of the full LLM lifecycle from prompt design to fine-tuning and evaluation.

- Experience building AI-driven products or internal tools-particularly in areas like chatbots, call analytics, or NLP-is highly valued.

- You should be comfortable working with APIs, microservices, and integrations using frameworks like FastAPI or Flask, and have a solid grasp of data pipelines and MLOps tooling such as MLflow, Weights & Biases, and Docker.

- Bonus points for experience with agent frameworks like LangGraph, CrewAI, or AutoGen, or with speech-related technologies such as ASR, TTS, and voice-based QA systems.