Posted on: 03/03/2026
Description :
- Architecting and implementing RLHF (Reinforcement Learning from Human Feedback) Frameworks.
- Training and fine-tuning Open-Source Vision-Language Models (VLMs).
- Deploying and scaling multimodal models to production serving millions of requests.
Key Responsibilities :
Architect & Build RLHF Frameworks :
- Design end-to-end RLHF pipelines (SFT - Reward Modeling - PPO/DPO)
- Develop scalable human feedback collection systems
- Implement preference modeling and ranking pipelines
- Optimize reward models for multimodal outputs (image + text)
- Build automated evaluation frameworks
Train & Fine-Tune OSS Vision-Language Models :
- Pretraining / instruction tuning multimodal models
- Parameter-efficient fine-tuning (LoRA, QLoRA)
- Dataset curation & synthetic data generation
- Scaling training on multi-GPU / multi-node clusters
- Optimizing for alignment, hallucination reduction, and safety
Highly Scalable Deployment of VLM Systems :
- Model serving using vLLM and Triton Inference Server
- Optimize latency, throughput, and cost
- Implement batching, KV caching, quantization, tensor parallelism
- Deploy on Kubernetes-based infrastructure
- Build monitoring for drift, performance, and hallucinations
Multimodal AI System Design :
- Implement retrieval-augmented multimodal pipelines
- Design evaluation benchmarks for VQA, grounding, and reasoning
- Ensure model safety and guardrails
Technical Leadership :
- Define technical roadmap for multimodal AI
- Review model architectures & code quality
- Collaborate with product and infrastructure teams.
Qualifications :
- 2+ years working with large-scale LLM or VLM systems
- Strong hands-on experience building RLHF pipelines (not just using libraries)
- Deep PyTorch expertise
- Experience training models >7B parameters
- Experience with distributed training (Deep Speed, FSDP)
- Production-grade deployment experience handling 10k+ QPS workloads
- Strong understanding of transformer architectures.
Did you find something suspicious?