Posted on: 19/03/2026
Description :
- Design, fine-tune, and validate LLMs for production use-cases : Instruction tuning, supervised fine-tuning, and parameter-efficient tuning (LoRA/adapters)
- Implement retrieval-augmented generation (RAG) pipelines : Embeddings, vector search, chunking, and context assembly for high-recall responses.
- Optimize inference for latency and cost : Quantization, model pruning, batching, and deployment with optimized runtimes (CUDA, Triton, bitsandbytes where applicable).
- Build backend services and APIs to serve LLM inference and orchestration using containerized deployments (Docker/Kubernetes) and CI/CD pipelines.
- Collaborate with product, data engineering, and ML teams to integrate LLMs into production flows, monitor model performance, and set up automated retraining/rollbacks.
- Create reproducible training pipelines, implement evaluation suites, and produce documentation and runbooks for model governance and observability.
Skills & Qualifications :
Must-Have :
- 4+ years of hands-on experience working with LLMs or advanced NLP models in production contexts.
- Proficiency in Python for ML engineering and model development.
- Experience with PyTorch and Hugging Face Transformers for training and fine-tuning.
- Practical experience implementing RAG and vector search using tools like FAISS or similar
vector databases.
- Familiarity with LangChain (or equivalent orchestration) and integration with LLM APIs (OpenAI, Anthropic, etc.).
- Experience containerizing and deploying ML services using Docker; familiarity with Kubernetes is a plus.
Preferred :
- Experience with inference optimizations : quantization (bitsandbytes), Triton, or GPU accelerated serving.
- Exposure to distributed training frameworks (DeepSpeed) and cloud MLOps platforms (SageMaker, Azure ML, GCP AI Platform).
- Knowledge of monitoring, logging, and model-evaluation frameworks for production LLMs (MLflow, Prometheus, Grafana).
Benefits & Culture Highlights :
- Collaborative, engineering-driven culture with strong focus on ownership and rapid iteration.
- Opportunity to build end-to-end LLM products for enterprise clients and influence architecture decisions.
- On-site role with hands-on access to GPU infrastructure and cross-functional product teams.
Skills : pytorch, cuda, docker, python, agentic, llm
Did you find something suspicious?