Were building large-scale generative AI products LLM-based features, retrieval-augmented pipelines, and production-grade model deployments. Seeking a hands-on engineer experienced in PyTorch/TensorFlow, Hugging Face stacks, fine-tuning (LoRA/QLoRA), and MLOps to lead experiments and productionize LLMs.

Key Responsibilities :

- Design, implement and fine-tune LLMs (transformer-based) for NLP and multimodal tasks.

- Build and maintain RAG, vector stores, retrieval pipelines (e.g., LlamaIndex, LangChain).

- Implement fine-tuning workflows : LoRA / QLoRA / full-FT / RLHF experiments.

- Optimize for latency & cost : quantization, pruning, distillation, and model serving strategies.

- Integrate and experiment with APIs (OpenAI, Anthropic, Google Gemini) and local LLM deployments.

- Create robust evaluation pipelines automated metrics, A/B tests, significance testing.

- Collaborate with engineers and researchers, mentor junior team members.

- Implement MLOps best practices : model/version tracking (MLflow/W&B), CI/CD for models, data pipelines (DVC).

Required Qualifications :

- B.Tech / M.Tech / MS in CS, Data Science, or related.

- 6+ years in IT; 3 years focused on AI/ML/Deep Learning.

- Strong Python expertise; comfortable in C++ or other high-level languages.

- Hands-on with PyTorch and TensorFlow; experience with Hugging Face Transformers.

- Experience with LangChain, LlamaIndex, Pydantic, and RAG systems.

- Familiarity converting R code to Python is a plus.

- Knowledge of SQL and working with relational DBs.

- Experience with model fine-tuning methods (LoRA, QLoRA, RLHF).

- Experience in MLOps : MLflow / W&B / DVC and production deployments.

- Strong communication, collaboration, and project ownership skills.

Preferred / Nice-to-have :

- Experience with GPU clusters (A100s), distributed training, mixed precision.

- Background in latency optimization, ONNX, model quantization toolkits.

- Academic publications or open-source contributions in generative models.

- Familiar with cloud infra (AWS/GCP) and containerized serving (K8s, TorchServe, Triton).