We operate in the AI/ML and Enterprise Software sector, building production-ready large language model (LLM) applications and retrieval-augmented generation (RAG) systems that solve real-world enterprise problems.

The team focuses on scalable, low-latency LLM inference, vector search, and data pipelines to deliver intelligent search, summarization, and automated knowledge workflows for customers across industries.

Role & Responsibilities :

- Design and implement end-to-end RAG solutions : document ingestion, embedding generation, vector indexing, retriever design, and LLM-based response generation.

- Develop and maintain Python back-end services and APIs that integrate LLMs, LangChain/LlamaIndex workflows, and vector search for production use.

- Optimize LLM inference performance : model selection, batching, quantization, ONNX/Triton integration, and memory/GPU optimization to meet latency and cost SLAs.

- Integrate and tune vector search stacks (FAISS, Milvus, Weaviate, or hosted vector DBs) and design embedding strategies for robust retrieval.

- Deploy and operate scalable infrastructure using Docker and orchestration platforms; automate CI/CD, monitoring, and alerting for ML services.

- Collaborate with Data Scientists and product teams to productionize models, implement A/B experiments, monitor drift, and iterate on model quality and UX.

Skills & Qualifications :

Must-Have :

- 4+ years of experience in machine learning or ML engineering with hands-on LLM projects.

- Strong software engineering in Python and building production back-end services.

- Experience with transformer frameworks and LLM tooling (Hugging Face Transformers, PyTorch).

- Practical experience building RAG pipelines and working with vector search (FAISS or similar).

- Proven experience deploying ML services with Docker and cloud environments (AWS/GCP/Azure).

- Knowledge of model optimization and serving techniques (quantization, ONNX, Triton, batching).

Preferred :

- Hands-on experience with LangChain, LlamaIndex, or similar orchestration frameworks.

- Familiarity with vector databases (Milvus, Weaviate) and managed vector DB services.

- Experience with MLOps and monitoring tools (MLflow, Prometheus, Grafana, model-drift tooling).

Benefits & Culture Highlights :

- Fully remote role with flexible hours supporting work-life balance across India.

- Opportunity to work on cutting-edge LLM/RAG products and influence architecture and tooling choices.

- Collaborative, fast-paced engineering culture that values ownership, experimentation, and scalable design.

To apply, bring strong Python engineering, hands-on LLM/RAG experience, and a passion for shipping scalable AI systems.

This role is ideal for engineers who enjoy end-to-end ownership of production ML services and optimizing LLMs for real user impact.