HamburgerMenu
hirist

TraceLink - Senior Machine Learning Engineer - Generative AI

TraceLink
Pune
4 - 7 Years

Posted on: 05/02/2026

Job Description

Description :

Senior ML Engineer GenAI & Agentic ML Systems


About the Role :


We are seeking a highly experienced Senior ML Engineer GenAI & ML Systems to lead the design, architecture, and implementation of advanced agentic AI systems within our next-generation supply chain platforms.


This role is hands-on and execution-focused. You will design, build, deploy, and maintain large-scale multi-agent systems capable of reasoning, planning, and executing complex workflows in dynamic, non-deterministic environments. You will also own production concerns, including context management, knowledge orchestration, evaluation, observability, and system reliability.


This position is ideal for a strong ML Engineer or Software Engineer with deep practical exposure to GenAI, data science, and modern ML systems, who is comfortable working end-to-endfrom architecture through production deployment. Experience in life sciences supply chain or other regulated environments is a strong plus.


Key Responsibilities :


- Architect, implement, and operate large-scale agentic AI / GenAI systems that automate and coordinate complex supply chain workflows.


- Design and build multi-agent systems, including agent coordination, planning, tool execution, long-term memory, feedback loops, and supervision.


- Develop and maintain advanced context and knowledge management systems, including :


- RAG and Advanced RAG pipelines


- Hybrid retrieval, reranking, grounding, and citation strategies


- Context window optimization and long-horizon task reliability


- Own the technical strategy for reliability and evaluation of non-deterministic AI systems, including :


a. Agent evaluation frameworks


b. Simulation-based testing


c. Regression testing for probabilistic outputs


d. Validation of agent decisions and outcomes


- Fine-tune and optimize LLMs/SLMs for domain performance, latency, cost efficiency, and task specialization (strong plus).


- Design and deploy scalable backend services using Python and Java, ensuring production-grade performance, security, and observability.


- Implement AI observability and feedback loops, including agent tracing, prompt/tool auditing, quality metrics, and continuous improvement pipelines.


- Apply and experiment with reinforcement learning or iterative improvement techniques within GenAI or agentic workflows where appropriate.


- Collaborate closely with product, data science, and domain experts to translate real-world supply chain requirements into intelligent automation solutions.


- Guide system architecture across distributed services, event-driven systems, and real-time data pipelines using cloud-native patterns.


- Mentor engineers, influence technical direction, and establish best practices for agentic AI and ML systems across teams.


Required Qualifications :


- 4+ years of experience building and operating cloud-native SaaS systems on AWS, GCP, or Azure (minimum 5 years with AWS).


- Strong ML Engineer / Software Engineer background with deep practical exposure to data science and GenAI systems.


- Expert-level, hands-on experience designing, deploying, and maintaining large multi-agent systems in production.


- Proven experience with advanced RAG and context management, including memory, state handling, tool grounding, and long-running workflows.


- 4+ years of hands-on Python experience delivering production-grade systems.


- Practical experience evaluating, monitoring, and improving non-deterministic AI behavior in real-world deployments.


- Hands-on experience with agent frameworks such as LangGraph, AutoGen, CrewAI, Semantic Kernel, or equivalent.


- Solid understanding of distributed systems, microservices, and production reliability best practices.


Big Plus / Preferred Qualifications :


- Hands-on experience fine-tuning LLMs or SLMs for domain-specific tasks (training, evaluation, deployment).


- Experience designing and deploying agentic systems in supply chain domains (logistics, manufacturing, planning, procurement).


- Strong knowledge of knowledge organization techniques, including RAG, Advanced RAG, hybrid search, and reranking.


- Experience applying reinforcement learning, reward modeling, or iterative optimization in GenAI workflows.


- Familiarity with Java and JavaScript/ECMAScript.


- Experience deploying AI solutions in regulated or enterprise environments with governance, security, and compliance requirements.


- Knowledge of life sciences supply chain or regulated industry ecosystems.


Who You Are :


- A hands-on technical leader who moves seamlessly between architecture and implementation.


- A builder who values practical, production-ready solutions over prototypes.


- Comfortable designing systems with probabilistic and emergent behavior.


- Passionate about building GenAI systems that are reliable, observable, explainable, and scalable.


- A clear communicator who can align stakeholders and drive execution across teams.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in