HamburgerMenu
hirist

Technical Lead - RAG/LLM

Recruiting Bond
7 - 10 Years
Bangalore

Posted on: 10/04/2026

Job Description

The Role :

Tech Leads are the connective tissue of engineering delivery the engineers who translate product intent into technical plans, manage cross-team dependencies, hold the quality bar sprint after sprint, and keep the team unblocked without becoming the bottleneck themselves.

- Own sprint-level technical planning: scope, dependencies, risk identification, and realistic estimation

- Hands-on : you write production code and lead code reviews you are an engineer, not a coordinator

- First escalation path for technical blockers: you unblock the team, you don't re-route the blockage

- Cross-functional bridge: you speak both engineering and product fluently neither side feels lost in translation

Core Responsibilities :

Technical Planning & Execution :

- Lead sprint planning: translate product requirements into engineering tasks with clear acceptance criteria, estimated complexity, and identified dependencies

- Own the squad's technical design for delivery scope: API contracts, data model decisions, service integration points documented before coding starts

- Track delivery progress: identify risks early (unclear requirements, infra dependencies, third-party API uncertainties), escalate to Staff/Principal before they become blockers

- Manage tech debt visibility: log, prioritise, and negotiate tech debt sprints ensure the team is not indefinitely accumulating debt while shipping features

Hands On Engineering :

- Write production-grade backend code : APIs, event consumers, data pipelines, and integration adapters your code is reviewed, approved, and deployed

- Lead code reviews : evaluate correctness, test coverage, error handling, observability, and design quality your reviews are educational, not perfunctory

- Own the squad's service health: build runbooks, define alert thresholds, participate in on-call, and drive RCA after incidents

- Validate performance before release: load test new services, profile latency-sensitive paths, and confirm p99 SLAs are met before launch

AI-Powered Feature Delivery :

- Coordinate ML model integration delivery : work with Data Scientists to define API contracts, implement model endpoint calls, build fallback logic, and instrument prediction logging

- Deliver LLM-powered features : RAG API integration, conversational flow state management, output validation, and error surface handling

- Ship Voice AI features : co-ordinate between ASR/TTS services, intent API, and booking flow ensuring low-latency end-to-end spoken booking

- Own A/B experiment infrastructure delivery for your squad: feature flag integration, metric collection, and experiment configuration

The Hard Engineering Problems You'll Face :

- Across all six platforms, the engineering challenges are real, non-trivial, and consequential:

Cache Invalidation at Speed :

- Fare data has a 30-second freshness window.

- A stale cache hit in the booking flow means a pricing error, a failed checkout, or a lost trust signal.

- Multi-tier cache design (L1/L2/L3), TTL strategies, event-driven invalidation via Kafka, and cache stampede prevention are all live problems.

Distributed Concurrency :

- Train Tatkal opening: millions of concurrent writes for 72 berths per coach.

- Optimistic locking, distributed lease management, queue-based fairness, and atomic seat allocation without deadlock under pathological load.

Event Ordering Guarantees :

- A booking event must arrive before its payment event.

- But Kafka doesn't guarantee cross-partition ordering.

- Building booking state machines with idempotency, deduplication, and out-of-order event tolerance is a continuous engineering challenge.

Multi-Tenancy Blast Radius :

- A B2B enterprise client's policy engine change must not affect the B2C booking flow.

- Multi-tenant isolation in shared infrastructure (API gateways, Kafka topics, DB schemas, cache namespaces) must be designed from day one.

AI Model Integration :

Serving a ranking model in the search critical path at p99 <20ms requires GPU node management, model warmup, request batching, async inference patterns, and fallback to heuristic ranking when the model is unavailable.

AI-First Engineering Mandate :

- Platform Engineers at every level are responsible for building systems that AI and ML can run on and increasingly, systems that are AI themselves.

ML Serving Infrastructure : your APIs must serve model predictions at p99 <20ms with graceful fallbacks you design the latency budget allocation

Feature Pipeline Engineering : real-time feature computation (Kafka Streams, Flink) feeding the feature store at sub-second freshness

RAG Backend Systems : vector store integration, embedding generation pipelines, document chunking and indexing for knowledge retrieval

Agentic Workflow Infrastructure : durable execution systems (Temporal) for multi-step LLM agent workflows with retry and compensation logic

Voice AI Backend : ASR request routing, low-latency TTS pipelines, spoken intent API design for conversational booking flows

Recommendation API Design : serving infrastructure for collaborative filtering, session-based models, and personalised ranking endpoints

Price Intelligence Pipelines : real-time competitive price ingestion, fare change event streaming, lower-price guarantee trigger systems

A/B Experiment Infrastructure : feature flags, traffic splitting, metric collection, and experiment configuration systems

MCP Tool Orchestration : building the tool-use APIs that LLM agents call to execute booking, modify, and cancel operations safely

Who You Are :

- 7 to 10 years in backend engineering with 12 years in a technical lead, delivery lead, or senior IC role with cross-team coordination experience

- Comfortable holding both technical depth (system design, code review) and delivery accountability (planning, risk management, timeline ownership)

- Strong written communicator: clear design specs, effective sprint retrospectives, honest status reports

- Strong in Java/Kotlin; familiar with distributed systems fundamentals, REST, Kafka, and Redis

- Tier-I institute preferred (IIT / IIIT / NIT / IISC / BITS CSE / ISE)

Technology Stack :

Backend : Java, Kotlin, Spring Boot, Ktor

Systems : Kafka, Redis, REST, gRPC, MySQL/DynamoDB

Cloud : AWS (EKS, EC2, S3, RDS), Kubernetes, Docker

Tooling : CI/CD pipelines, Feature flags, Monitoring dashboards, Load testing (k6)

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in