HamburgerMenu
hirist

LLM Engineer/AI Engineer - RAG/LangChain

YO IT CONSULTING
Hyderabad
5 - 8 Years

Posted on: 09/03/2026

Job Description

Description : LLM Engineer / AI Engineer


Location : Hyderabad, India


Working Model : Hybrid (3 Days Work from Office)


Experience : 5 to 8 Years


Education : Bachelor of Engineering / Bachelor of Technology (B.E. / B.Tech) in Computer Science, Artificial Intelligence, Data Science, or a related technical field.


Role Overview :


We are looking for an experienced LLM Engineer/AI Engineer to design, develop, and optimize cutting-edge Large Language Model (LLM) based systems that power intelligent automation and advanced AI agents. The candidate will play a key role in building scalable AI architectures capable of integrating with internal knowledge bases, customer systems, and trading platform data.


The ideal candidate should have hands-on experience working with state-of-the-art LLMs, Retrieval-Augmented Generation (RAG) architectures, prompt engineering frameworks, and agent-based AI systems. You will collaborate closely with data scientists, backend engineers, product teams, and DevOps teams to deploy robust AI-powered solutions in production environments.


This role involves working on high-performance AI infrastructure, optimizing model performance, latency, cost efficiency, and ensuring responsible AI practices through robust evaluation and monitoring frameworks.


Key Responsibilities :


1. Model Engineering :


- Design, implement, and optimize LLM-powered pipelines using both proprietary and open-source language models.


- Build scalable AI pipelines using models such as OpenAI GPT models, Anthropic Claude, Meta Llama, Mistral, and other open-source LLMs.


- Evaluate models based on performance, cost, latency, and accuracy to determine optimal deployment strategies.


- Implement model orchestration workflows for production-grade applications.


- Fine-tune or adapt models using parameter-efficient tuning techniques (LoRA, adapters, etc.) where applicable.


- Continuously research and integrate new advancements in LLM architectures and frameworks.


2. Retrieval-Augmented Generation (RAG) Architecture :


- Design and implement sophisticated RAG pipelines to ensure AI agents have access to relevant and up-to-date information.


- Develop RAG systems that integrate internal documentation, customer interaction history, and trading platform data.


- Implement vector databases and semantic search systems to support retrieval workflows.


- Optimize document chunking, embedding strategies, and retrieval ranking.


- Work with tools such as Pinecone, Weaviate, FAISS, Chroma, or similar vector databases.


- Ensure data security, compliance, and proper access controls within retrieval pipelines.


3. Prompt Engineering & Model Tuning :


- Develop advanced prompt strategies that maximize reasoning capability and output reliability.


- Design, test, and version-control complex prompt templates.


Implement techniques such as :


- Maintain prompt libraries and experiment frameworks.


- Continuously refine prompts to improve accuracy, reasoning ability, and context awareness.


- Work with frameworks such as LangChain, LlamaIndex, Semantic Kernel, or similar orchestration tools.


4. Evaluation & Testing :


- Build strong evaluation pipelines to ensure safe and reliable model performance.


- Develop LLM evaluation frameworks using automated metrics and human-in-the-loop feedback.


- Implement LLM-as-a-Judge systems for evaluating outputs.


- Measure key performance metrics such as:


- Create automated testing pipelines for pre-production validation.


- Establish benchmarking protocols for comparing models and system architectures.


5. Tool-Use Logic & Agent Orchestration :


- Develop logic that enables autonomous agents to interact with external systems.


- Implement agent frameworks that allow models to use external tools and APIs.


- Ensure safe execution, error handling, and access control for agent actions.


- Implement planning and reasoning architectures for multi-step task execution.


6. Latency Optimization & Performance Engineering :


- Ensure high-performance AI systems capable of responding in near real-time.


- Optimize inference pipelines to reduce response latency.


- Improve vector search speed and data retrieval performance.


- Implement caching strategies and request batching.


- Design scalable systems capable of handling high request volumes.


- Work closely with infrastructure teams to optimize GPU/CPU utilization.


7. DevOps for AI (LLMOps) :


- Support deployment, monitoring, and maintenance of AI systems in production environments.


- Implement LLMOps pipelines for continuous monitoring of model behavior.


- Track metrics such as model drift, hallucination trends, latency, and token usage.


- Set up logging, observability, and alerting systems.


- Collaborate with DevOps teams to ensure secure and scalable AI deployments.


- Use tools such as MLflow, Weights & Biases, LangSmith, Arize AI, or similar platforms.


Required Skills & Technical Expertise :


- Strong experience with Python programming


Experience with LLM frameworks such as :


- LangChain


- LlamaIndex


- Semantic Kernel


- Experience with vector databases and embeddings


- Hands-on knowledge of RAG architectures


- Understanding of transformer models and NLP concepts


- Experience working with REST APIs, microservices, and backend systems


- Familiarity with cloud platforms (AWS, Azure, or GCP)


- Experience with Docker, Kubernetes, and scalable infrastructure


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in