Posted on: 13/11/2025
Description :
Role : Generative AI Engineer
Experience : 7 9 years
Location : Bangalore
About the role :
Were building large-scale generative AI products LLM-based features, retrieval-augmented pipelines, and production-grade model deployments. Seeking a hands-on engineer experienced in PyTorch/TensorFlow, Hugging Face stacks, fine-tuning (LoRA/QLoRA), and MLOps to lead experiments and productionize LLMs.
Key Responsibilities :
- Design, implement and fine-tune LLMs (transformer-based) for NLP and multimodal tasks.
- Build and maintain RAG, vector stores, retrieval pipelines (e.g., LlamaIndex, LangChain).
- Implement fine-tuning workflows : LoRA / QLoRA / full-FT / RLHF experiments.
- Optimize for latency & cost : quantization, pruning, distillation, and model serving strategies.
- Integrate and experiment with APIs (OpenAI, Anthropic, Google Gemini) and local LLM deployments.
- Create robust evaluation pipelines automated metrics, A/B tests, significance testing.
- Collaborate with engineers and researchers, mentor junior team members.
- Implement MLOps best practices : model/version tracking (MLflow/W&B), CI/CD for models, data pipelines (DVC).
Required Qualifications :
- B.Tech / M.Tech / MS in CS, Data Science, or related.
- 6+ years in IT; 3 years focused on AI/ML/Deep Learning.
- Strong Python expertise; comfortable in C++ or other high-level languages.
- Hands-on with PyTorch and TensorFlow; experience with Hugging Face Transformers.
- Experience with LangChain, LlamaIndex, Pydantic, and RAG systems.
- Familiarity converting R code to Python is a plus.
- Knowledge of SQL and working with relational DBs.
- Experience with model fine-tuning methods (LoRA, QLoRA, RLHF).
- Experience in MLOps : MLflow / W&B / DVC and production deployments.
- Strong communication, collaboration, and project ownership skills.
Preferred / Nice-to-have :
- Experience with GPU clusters (A100s), distributed training, mixed precision.
- Background in latency optimization, ONNX, model quantization toolkits.
- Academic publications or open-source contributions in generative models.
- Familiar with cloud infra (AWS/GCP) and containerized serving (K8s, TorchServe, Triton).
Did you find something suspicious?