Description : LLM Engineer / AI Engineer
Location : Hyderabad, India
Working Model : Hybrid (3 Days Work from Office)
Experience : 5 to 8 Years
Education : Bachelor of Engineering / Bachelor of Technology (B.E. / B.Tech) in Computer Science, Artificial Intelligence, Data Science, or a related technical field.
Role Overview :
We are looking for an experienced LLM Engineer/AI Engineer to design, develop, and optimize cutting-edge Large Language Model (LLM) based systems that power intelligent automation and advanced AI agents. The candidate will play a key role in building scalable AI architectures capable of integrating with internal knowledge bases, customer systems, and trading platform data.
The ideal candidate should have hands-on experience working with state-of-the-art LLMs, Retrieval-Augmented Generation (RAG) architectures, prompt engineering frameworks, and agent-based AI systems. You will collaborate closely with data scientists, backend engineers, product teams, and DevOps teams to deploy robust AI-powered solutions in production environments.
This role involves working on high-performance AI infrastructure, optimizing model performance, latency, cost efficiency, and ensuring responsible AI practices through robust evaluation and monitoring frameworks.
Key Responsibilities :
1. Model Engineering :
- Design, implement, and optimize LLM-powered pipelines using both proprietary and open-source language models.
- Build scalable AI pipelines using models such as OpenAI GPT models, Anthropic Claude, Meta Llama, Mistral, and other open-source LLMs.
- Evaluate models based on performance, cost, latency, and accuracy to determine optimal deployment strategies.
- Implement model orchestration workflows for production-grade applications.
- Fine-tune or adapt models using parameter-efficient tuning techniques (LoRA, adapters, etc.) where applicable.
- Continuously research and integrate new advancements in LLM architectures and frameworks.
2. Retrieval-Augmented Generation (RAG) Architecture :
- Design and implement sophisticated RAG pipelines to ensure AI agents have access to relevant and up-to-date information.
- Develop RAG systems that integrate internal documentation, customer interaction history, and trading platform data.
- Implement vector databases and semantic search systems to support retrieval workflows.
- Optimize document chunking, embedding strategies, and retrieval ranking.
- Work with tools such as Pinecone, Weaviate, FAISS, Chroma, or similar vector databases.
- Ensure data security, compliance, and proper access controls within retrieval pipelines.
3. Prompt Engineering & Model Tuning :
- Develop advanced prompt strategies that maximize reasoning capability and output reliability.
- Design, test, and version-control complex prompt templates.
Implement techniques such as :
- Maintain prompt libraries and experiment frameworks.
- Continuously refine prompts to improve accuracy, reasoning ability, and context awareness.
- Work with frameworks such as LangChain, LlamaIndex, Semantic Kernel, or similar orchestration tools.
4. Evaluation & Testing :
- Build strong evaluation pipelines to ensure safe and reliable model performance.
- Develop LLM evaluation frameworks using automated metrics and human-in-the-loop feedback.
- Implement LLM-as-a-Judge systems for evaluating outputs.
- Measure key performance metrics such as:
- Create automated testing pipelines for pre-production validation.
- Establish benchmarking protocols for comparing models and system architectures.
5. Tool-Use Logic & Agent Orchestration :
- Develop logic that enables autonomous agents to interact with external systems.
- Implement agent frameworks that allow models to use external tools and APIs.
- Ensure safe execution, error handling, and access control for agent actions.
- Implement planning and reasoning architectures for multi-step task execution.
6. Latency Optimization & Performance Engineering :
- Ensure high-performance AI systems capable of responding in near real-time.
- Optimize inference pipelines to reduce response latency.
- Improve vector search speed and data retrieval performance.
- Implement caching strategies and request batching.
- Design scalable systems capable of handling high request volumes.
- Work closely with infrastructure teams to optimize GPU/CPU utilization.
7. DevOps for AI (LLMOps) :
- Support deployment, monitoring, and maintenance of AI systems in production environments.
- Implement LLMOps pipelines for continuous monitoring of model behavior.
- Track metrics such as model drift, hallucination trends, latency, and token usage.
- Set up logging, observability, and alerting systems.
- Collaborate with DevOps teams to ensure secure and scalable AI deployments.
- Use tools such as MLflow, Weights & Biases, LangSmith, Arize AI, or similar platforms.
Required Skills & Technical Expertise :
- Strong experience with Python programming
Experience with LLM frameworks such as :
- LangChain
- LlamaIndex
- Semantic Kernel
- Experience with vector databases and embeddings
- Hands-on knowledge of RAG architectures
- Understanding of transformer models and NLP concepts
- Experience working with REST APIs, microservices, and backend systems
- Familiarity with cloud platforms (AWS, Azure, or GCP)
- Experience with Docker, Kubernetes, and scalable infrastructure
Did you find something suspicious?