Posted on: 15/12/2025
Job Title : Senior AI Engineer - LLMs, RAG, and Vector Systems.
Experience : 5-10 years in AI/ML (including 3-4 end-to-end AI/LLM project implementations).
Location : Remote.
Employment Type : Full-Time.
Role Summary :
The Senior AI Engineer will lead the design and development of advanced Generative AI systems, including embeddings pipelines, vector database architectures, retrieval-augmented generation (RAG) frameworks, model evaluation pipelines, and enterprise-grade LLM integrations.
The role requires deep expertise in transformer architectures, fine-tuning and optimizing LLMs, and implementing GPU-accelerated AI workloads using PyTorch, TensorFlow, and CUDA.
The engineer will collaborate with cross-functional teams to build scalable, secure, and highly performant AI platforms.
Key Responsibilities :
LLM & RAG Architecture :
- Design, build, and optimize end-to-end RAG systems including retrievers, rankers, context assembly, and generative components.
- Develop and fine-tune LLMs (open-source and proprietary) for domain-specific use cases.
- Implement prompt engineering, prompt orchestration, and guardrails for enterprise applications.
- Create and optimize embedding generation workflows using transformer-based models.
Vector Database & Retrieval Systems :
- Architect high-performance vector search solutions using vector databases (e., FAISS, Pinecone, Weaviate, Milvus, PGVector).
- Implement indexing strategies, ANN algorithms, sharding, and scaling approaches for large embedding stores.
- Ensure latency optimization, relevance tuning, and reliability of retrieval pipelines.
Evaluation & Monitoring Pipelines :
- Build automated evaluation frameworks for RAG/LLM pipelines using metrics such as faithfulness, relevance, hallucination detection, and latency.
- Operationalize model monitoring, drift detection, feedback loops, and continuous improvement workflows.
- Integrate human-in-the-loop (HITL) evaluation mechanisms for production AI systems. ML Engineering & Orchestration.
- Develop scalable embeddings and model-serving pipelines using Airflow, Kubeflow, Ray, or similar orchestration frameworks.
- Optimize model performance on GPUs leveraging CUDA kernels, mixed precision training, and distributed training techniques.
- Implement CI/CD for ML pipelines, model versioning, and reproducibility using MLOps practices.
Integration & Platform Engineering :
- Build APIs, microservices, and inference endpoints to integrate LLM capabilities into enterprise applications.
- Collaborate with data engineering teams to integrate AI services with data lakes, warehouses, and unstructured content repositories.
- Ensure security, compliance, observability, and uptime for all AI services.
Required Skills & Qualifications :
- 5-10 years of hands-on experience in AI/ML engineering.
- Minimum 3-4 full-cycle AI/LLM projects delivered in enterprise or production environments.
- Deep understanding of transformer architectures, LLM internals, fine-tuning strategies, and RAG frameworks.
- Strong proficiency in Python, PyTorch, TensorFlow, and GPU-accelerated development using CUDA.
- Experience with vector search technologies (FAISS, Pinecone, Weaviate, Milvus, etc.
- Expertise in building embeddings pipelines, evaluation systems, and scalable ML workflows.
- Strong understanding of distributed systems, containerization (Docker), Kubernetes, and API development.
- Knowledge of data security, privacy, and governance considerations for enterprise AI.
- Bachelor's or Master's degree in Computer Science, AI/ML, Data Science, or a related technical field.
Preferred Qualifications :
- Experience with commercial LLM ecosystems (OpenAI, Anthropic, Meta Llama, Mistral, etc.
- Familiarity with GPU cluster management (NVIDIA Triton, DeepSpeed, Hugging Face Accelerate).
- Prior work in information retrieval, NLP pipelines, and knowledge-augmented generative systems.
- Contributions to open-source AI projects or research publications.
Success Criteria :
- Delivery of high-performing and reliable RAG/LLM systems at scale.
- Demonstrated reduction in latency, improvement in retrieval quality, and model performance gains.
- Strong cross-functional collaboration with engineering, product, and business stakeholders.
- Robust evaluation, monitoring, and continuous learning pipelines in production.
Did you find something suspicious?