A fast-scaling organization in the Enterprise AI & Machine Learning sector, focused on building production-grade Large Language Model (LLM) solutions, Retrieval-Augmented Generation (RAG) systems, and real-time intelligent search for B2B customers.

The team delivers low-latency inference, scalable vector search, and robust MLOps for mission-critical applications.

Primary role title : Senior Machine Learning Engineer (LLM & RAG).

Location : Pune, India On-site.

Role & Responsibilities :

- Design, build, and productionize end-to-end LLM & RAG pipelines : data ingestion, embedding generation, vector indexing, retrieval, and inference integration.

- Implement and optimize vector search solutions using FAISS/Pinecone and integrate with prompt orchestration frameworks (e.g., LangChain).

- Optimize model serving for latency and cost : batching, quantization, ONNX/Triton deployment, and autoscaling on Kubernetes.

- Develop robust microservices and REST/gRPC APIs to expose inference and retrieval capabilities to product teams.

- Establish CI/CD, monitoring, and observability for ML models and pipelines (model validation, drift detection, alerting).

- Collaborate with data scientists and platform engineers to iterate on model architectures, embeddings, and prompt strategies; mentor junior engineers.

Skills & Qualifications :

Must-Have :

- PyTorch.

- Hugging Face Transformers.

- LangChain.

- Retrieval-Augmented Generation.

- FAISS.

- Pinecone.

- Docker.

- Kubernetes.

Preferred :

- Triton Inference Server.

- Apache Kafka.

- Model quantization.

Qualifications : 6-9 years of hands-on experience in ML/LLM engineering with a strong track record of shipping production ML systems.

- Comfortable working on-site in Pune.

- Strong software engineering fundamentals and experience collaborating across product and data teams.

Benefits & Culture Highlights :

- Opportunity to lead end-to-end LLM projects and shape AI product direction in a growth-stage engineering team.

- Collaborative, fast-paced environment with mentorship, tech ownership, and exposure to modern MLOps tooling.

- Competitive compensation, professional development budget, and on-site engineering culture in Pune.

- To apply, bring strong LLM production experience, demonstrable RAG implementations, and a bias for scalable, maintainable systems.

- Join an engineering-first team building the next generation of AI-powered enterprise features.