Job Title : Senior AI Engineer - LLMs, RAG, and Vector Systems.

Experience : 5-10 years in AI/ML (including 3-4 end-to-end AI/LLM project implementations).

Location : Remote.

Employment Type : Full-Time.

Role Summary :

The Senior AI Engineer will lead the design and development of advanced Generative AI systems, including embeddings pipelines, vector database architectures, retrieval-augmented generation (RAG) frameworks, model evaluation pipelines, and enterprise-grade LLM integrations.

The role requires deep expertise in transformer architectures, fine-tuning and optimizing LLMs, and implementing GPU-accelerated AI workloads using PyTorch, TensorFlow, and CUDA.

The engineer will collaborate with cross-functional teams to build scalable, secure, and highly performant AI platforms.

Key Responsibilities :

LLM & RAG Architecture :

- Design, build, and optimize end-to-end RAG systems including retrievers, rankers, context assembly, and generative components.

- Develop and fine-tune LLMs (open-source and proprietary) for domain-specific use cases.

- Implement prompt engineering, prompt orchestration, and guardrails for enterprise applications.

- Create and optimize embedding generation workflows using transformer-based models.

Vector Database & Retrieval Systems :

- Architect high-performance vector search solutions using vector databases (e., FAISS, Pinecone, Weaviate, Milvus, PGVector).

- Implement indexing strategies, ANN algorithms, sharding, and scaling approaches for large embedding stores.

- Ensure latency optimization, relevance tuning, and reliability of retrieval pipelines.

Evaluation & Monitoring Pipelines :

- Build automated evaluation frameworks for RAG/LLM pipelines using metrics such as faithfulness, relevance, hallucination detection, and latency.

- Operationalize model monitoring, drift detection, feedback loops, and continuous improvement workflows.

- Integrate human-in-the-loop (HITL) evaluation mechanisms for production AI systems. ML Engineering & Orchestration.

- Develop scalable embeddings and model-serving pipelines using Airflow, Kubeflow, Ray, or similar orchestration frameworks.

- Optimize model performance on GPUs leveraging CUDA kernels, mixed precision training, and distributed training techniques.

- Implement CI/CD for ML pipelines, model versioning, and reproducibility using MLOps practices.

Integration & Platform Engineering :

- Build APIs, microservices, and inference endpoints to integrate LLM capabilities into enterprise applications.

- Collaborate with data engineering teams to integrate AI services with data lakes, warehouses, and unstructured content repositories.

- Ensure security, compliance, observability, and uptime for all AI services.

Required Skills & Qualifications :

- 5-10 years of hands-on experience in AI/ML engineering.

- Minimum 3-4 full-cycle AI/LLM projects delivered in enterprise or production environments.

- Deep understanding of transformer architectures, LLM internals, fine-tuning strategies, and RAG frameworks.

- Strong proficiency in Python, PyTorch, TensorFlow, and GPU-accelerated development using CUDA.

- Experience with vector search technologies (FAISS, Pinecone, Weaviate, Milvus, etc.

- Expertise in building embeddings pipelines, evaluation systems, and scalable ML workflows.

- Strong understanding of distributed systems, containerization (Docker), Kubernetes, and API development.

- Knowledge of data security, privacy, and governance considerations for enterprise AI.

- Bachelor's or Master's degree in Computer Science, AI/ML, Data Science, or a related technical field.

Preferred Qualifications :

- Experience with commercial LLM ecosystems (OpenAI, Anthropic, Meta Llama, Mistral, etc.

- Familiarity with GPU cluster management (NVIDIA Triton, DeepSpeed, Hugging Face Accelerate).

- Prior work in information retrieval, NLP pipelines, and knowledge-augmented generative systems.

- Contributions to open-source AI projects or research publications.

Success Criteria :

- Delivery of high-performing and reliable RAG/LLM systems at scale.

- Demonstrated reduction in latency, improvement in retrieval quality, and model performance gains.

- Strong cross-functional collaboration with engineering, product, and business stakeholders.

- Robust evaluation, monitoring, and continuous learning pipelines in production.