About LenDenClub :

LenDenClub is Indias largest RBI-registered (NBFC-P2P) Peer-to-Peer (P2P) lending platform. We are a platform for lenders seeking high interest with creditworthy borrowers, bridging the gap left by traditional credit institutions. With over 3 crore users and ?16,000 crore+ in loans disbursed, we command more than 98% of Indias P2P lending market. Our 4.4+ rating on the App Store reflects our commitment to offering a trustworthy and secure lending experience. Powered by cutting-edge technology and a user-first approach, we are setting new benchmarks in Indias evolving fintech ecosystem. The progressive approach towards employee benefits has been acknowledged and appreciated and as a result, LenDenClub has been certified as a 'Great Place to Work' successively for four years by the Great Place to Work Institute, Inc.

About InstaMoney :

InstaMoney is our cutting-edge Loan Service Provider (LSP) platform, built to make borrowing fast, flexible, and fully digital for users across India. With over 30 million downloads, InstaMoney offers seamless credit access through a simple and intuitive mobile app. From Personal Loans and Merchant Loans, InstaMoney provides short- to mid-tenure credit solutions.

Position : LLMOps Engineer

Location : Malad, Mumbai

Work Mode : Work from office, 6 days/week (alternate Saturdays off)

JD :

LLMOps Engineer (5+ years)

We are looking for a seasoned LLMOps Engineer to design, deploy and optimize enterprise-grade LLM systems with a focus on scale, cost and compliance.

Responsibilities :

- Manage full LLM lifecycle : data preprocessing, fine-tuning, evaluation, deployment, retraining.

- Deploy & scale large-weight LLMs with distributed inference, GPU optimization, quantization and caching.

- Build RAG pipelines with embedding generation, vector DBs (FAISS, Pinecone, Weaviate, Milvus) and retrieval strategies.

- Implement observability frameworks for latency, token usage, hallucination detection, guardrails, cost tracking.

- Integrate LLMs with APIs, microservices, and enterprise workflows (LangChain, LlamaIndex, Triton Inference Server).

- Continuously optimize GPU utilization, inference throughput and cost efficiency.

Requirements :

- Strong knowledge of LLM deployment, RAG architectures, embeddings, multimodal pipelines.

- Expertise in AWS GPU infra (EKS, Inferentia, Trainium), caching and distributed inference.

- Experience with monitoring frameworks.

- Proven track record of scaling enterprise LLM applications with compliance & observability frameworks.