Posted on: 03/12/2025
Machine Learning Engineer - LLM & RAG (Remote, India).
About The Opportunity :
- Design and implement end-to-end RAG solutions : document ingestion, embedding generation, vector indexing, retriever design, and LLM-based response generation.
- Develop and maintain Python back-end services and APIs that integrate LLMs, LangChain/LlamaIndex workflows, and vector search for production use.
- Optimize LLM inference performance : model selection, batching, quantization, ONNX/Triton integration, and memory/GPU optimization to meet latency and cost SLAs.
- Integrate and tune vector search stacks (FAISS, Milvus, Weaviate, or hosted vector DBs) and design embedding strategies for robust retrieval.
- Deploy and operate scalable infrastructure using Docker and orchestration platforms; automate CI/CD, monitoring, and alerting for ML services.
- Collaborate with Data Scientists and product teams to productionize models, implement A/B experiments, monitor drift, and iterate on model quality and UX.
Skills & Qualifications :
Must-Have :
- Strong software engineering in Python and building production back-end services.
- Experience with transformer frameworks and LLM tooling (Hugging Face Transformers, PyTorch).
- Practical experience building RAG pipelines and working with vector search (FAISS or similar).
- Proven experience deploying ML services with Docker and cloud environments (AWS/GCP/Azure).
- Knowledge of model optimization and serving techniques (quantization, ONNX, Triton, batching).
Preferred :
- Familiarity with vector databases (Milvus, Weaviate) and managed vector DB services.
- Experience with MLOps and monitoring tools (MLflow, Prometheus, Grafana, model-drift tooling).
Benefits & Culture Highlights :
- Opportunity to work on cutting-edge LLM/RAG products and influence architecture and tooling choices.
- Collaborative, fast-paced engineering culture that values ownership, experimentation, and scalable design.
To apply, bring strong Python engineering, hands-on LLM/RAG experience, and a passion for shipping scalable AI systems.
This role is ideal for engineers who enjoy end-to-end ownership of production ML services and optimizing LLMs for real user impact.
Did you find something suspicious?