Posted on: 21/04/2026
Description :
Job Title : AI Developer
Location : Mumbai, Chembur West (Work From Office Only)
Experience : 3 - 5 years
Role Summary :
We are seeking an experienced AI Developer to lead the fine-tuning, deployment, and optimization of the custom Proniti AI model based on the prevailing Ai models architecture (26B-A4B MoE / 31B Dense). You will be responsible for transforming the base model into a highly secure, autonomous reasoning engine capable of executing complex standard operating procedure (SOP) gap analyses and regulatory reporting.
Key Responsibilities :
- Model Fine-Tuning: Configure and execute Supervised Fine-Tuning (SFT) pipelines using Parameter-Efficient Fine-Tuning (PEFT) methodologies.
- Utilize Quantized Low-Rank Adaptation (QLoRA) with frameworks like Hugging Face TRL and Unsloth (using bitsandbytes nf4 quantization) to adapt the model without catastrophic forgetting.
- Sovereign Infrastructure Deployment: Manage the deployment of the model on sovereign Indian cloud infrastructure.
- Work directly with Dedicated infrastructure, NVIDIA H100 or L40S GPU clusters hosted in Mumbai-based Tier IV data centers to ensure data privacy and ultra-low latency.
- Inference Optimization : Deploy and configure the vLLM inference engine. You will optimize the server using flags like gpu-memory-utilization for long context management and enable Gemma 4's specific parsers (reasoning-parser gemma4).
- Agentic Tool Orchestration: Implement native tool-calling capabilities by mapping Proniti's backend APIs to the model's .
- Constrained Decoding : Implement structured JSON output generation via vLLM's guided decoding engine to guarantee that the AI generates perfectly structured data payloads for the Proniti Compliance Dashboard.
- Security Governance : Integrate the open-source Agent Governance Toolkit to provide deterministic, sub-millisecond policy enforcement, preventing risks like tool misuse or prompt injections.
Requirements :
- 3 to 5+ years of experience in Deep Learning, NLP, and AI Systems Engineering.
- Strong proficiency in Python, PyTorch, and the Hugging Face ecosystem.
- Proven hands-on experience with LLM/SLM fine-tuning techniques (LoRA, QLoRA) and quantization.
- Deep understanding of inference servers (specifically vLLM) and GPU memory optimization (KV caching, PagedAttention).
- Experience building autonomous AI agents and utilizing JSON schemas for strict output decoding.
Qualifications : BE IT / BSc IT / MSc IT / MCA / ME IT, M.Tech IT or equivalent
The job is for:
Did you find something suspicious?