HamburgerMenu
hirist

Lead Software Engineer - Java/Kotlin/Golang

Recruiting Bond
7 - 10 Years
Bangalore

Posted on: 10/04/2026

Job Description

Description :

Lead Software Engineer - Platform Engineering

- 7 to 10 years


Delivery Ownership :


- Hands On Engineering


- API Design


- Team Technical Health

The Role :

- Lead Engineers make delivery happen reliably, with quality, at pace.

- You are the person who takes a complex requirement, breaks it into well-scoped engineering tasks, assigns and reviews the work, makes the design calls for your scope, and ships.


You are hands on :


- Your code is in production.

You are a multiplier :


- The engineers around you are faster and better because of your presence.

- Own technical delivery for a feature area or service cluster within a platform vertical

- Hands on : 4060% of your time is writing and reviewing production code

- Your API and data model decisions are sound enough that Staff doesn't need to revisit them

- The Engineers on your immediate team grow through your code reviews, pairing, and feedback

Core Responsibilities :

Technical Delivery :

- Own the technical design for your scope : API contracts, data models, service boundaries, caching strategies, and event schemas documented in ADRs or lightweight design docs

- Write production-grade code in Java/Kotlin/Go : clean, tested, observable, and deployable without heroics

- Lead code reviews for your team: catching logic errors and design issues, but also teaching better patterns your reviews reduce bugs in future PRs, not just the current one

- Break down complex requirements into well-scoped, estimated engineering tasks with clear acceptance criteria no ambiguous user stories on your board

- Manage technical risk: identify dependencies, flag blockers early, coordinate with adjacent teams, and update timelines before they slip

Service Reliability :

- Own the reliability of your services: define SLOs with your tech lead, build monitoring dashboards, write runbooks, and participate in on-call rotation

- Conduct performance testing before major launches: load profiles, latency percentiles (p50/p95/p99), error rates under load, and saturation points

- Manage service-level technical debt: identify, quantify, schedule, and deliver tech debt reduction sprints without dropping feature velocity

AI/ML Feature Delivery :

- Deliver AI-powered features end-to-end: integrate ranking model APIs, build recommendation display endpoints, implement LLM response handling with timeout and fallback

- Build the instrumentation for ML model evaluation: log prediction inputs, outputs, and user feedback signals that feed offline model evaluation pipelines

- Implement feature flag controls for ML model rollout: gradual traffic splitting, kill-switch safety, and metric comparison dashboards

- Deliver Voice AI integrations: ASR API integration, spoken intent response handling, voice booking session state management

Platform Work Across All Six Verticals :

What You'll Own & Build :

- Flights

- Pricing API latency under GDS API variability

- Lead the fare pricing service: request fan-out to multiple GDS providers, parallel aggregation, timeout handling, and cached fallback pricing

Hotels :

- Supplier rate normalisation across 50+ feed formats

- Lead the rate normalisation service: schema mapping, validation, deduplication, and the transformation pipeline feeding the unified rate store

Bus :

- Seat map real-time consistency across aggregators

- Lead the seat availability service: real-time seat map fetching, TTL-based staleness management, partial availability handling, and booking lockout on seat conflict

Train :

- PNR lifecycle state machine reliability

- Lead the PNR management service: booking confirmation, modification, cancellation, and refund state transitions with idempotent IRCTC API calls

B2B :

- Approval workflow SLA under asynchronous delegation

- Lead the approval engine: configurable routing rules, escalation triggers, delegation management, SLA countdown, and notification delivery guarantees

Core :

- Feature flag rollout safety for platform-wide configs

- Lead the feature flag service: gradual rollout controls, audience targeting, kill-switch safety, and experiment integration with the A/B platform

The Hard Engineering Problems You'll Face :

Across all six platforms, the engineering challenges are real, non-trivial, and consequential:

Cache Invalidation at Speed :

- Fare data has a 30-second freshness window.

- A stale cache hit in the booking flow means a pricing error, a failed checkout, or a lost trust signal.

- Multi-tier cache design (L1/L2/L3), TTL strategies, event-driven invalidation via Kafka, and cache stampede prevention are all live problems.

Distributed Concurrency :

- Train Tatkal opening : millions of concurrent writes for 72 berths per coach.

- Optimistic locking, distributed lease management, queue-based fairness, and atomic seat allocation without deadlock under pathological load.

Event Ordering Guarantees :

- A booking event must arrive before its payment event.

- But Kafka doesn't guarantee cross-partition ordering.

- Building booking state machines with idempotency, deduplication, and out-of-order event tolerance is a continuous engineering challenge.

Observability Gaps :

- 200+ services.

- A booking fails.

- The trace crosses 8 service boundaries.

- Without distributed tracing (OpenTelemetry/Jaeger), structured logging (correlation IDs, trace context propagation), and SLO dashboards (Prometheus + Grafana), debugging is archaeology, not engineering.

Multi-Tenancy Blast Radius :

- A B2B enterprise client's policy engine change must not affect the B2C booking flow.

- Multi-tenant isolation in shared infrastructure (API gateways, Kafka topics, DB schemas, cache namespaces) must be designed from day one.

AI Model Integration :

Serving a ranking model in the search critical path at p99 <20ms requires GPU node management, model warmup, request batching, async inference patterns, and fallback to heuristic ranking when the model is unavailable.

AI-First Engineering Mandate :

- Platform Engineers at every level are responsible for building systems that AI and ML can run on and increasingly, systems that are AI themselves.

- ML Serving Infrastructure: your APIs must serve model predictions at p99 <20ms with graceful fallbacks you design the latency budget allocation

- Feature Pipeline Engineering: real-time feature computation (Kafka Streams, Flink) feeding the feature store at sub-second freshness

- RAG Backend Systems: vector store integration, embedding generation pipelines, document chunking and indexing for knowledge retrieval

Agentic Workflow Infrastructure: durable execution systems (Temporal) for multi-step LLM agent workflows with retry and compensation logic

Voice AI Backend: ASR request routing, low-latency TTS pipelines, spoken intent API design for conversational booking flows

Recommendation API Design: serving infrastructure for collaborative filtering, session-based models, and personalised ranking endpoints

Price Intelligence Pipelines: real-time competitive price ingestion, fare change event streaming, lower-price guarantee trigger systems

A/B Experiment Infrastructure: feature flags, traffic splitting, metric collection, and experiment configuration systems

MCP Tool Orchestration: building the tool-use APIs that LLM agents call to execute booking, modify, and cancel operations safely

Who You Are :

- 7 to 10 years in backend engineering; have delivered at least one major feature or service from design through production monitoring

- Strong in Java or Kotlin; comfortable with Spring Boot, microservices, REST, and async messaging

- Experience with Kafka, Redis, and at least one cloud platform (AWS preferred)

- Natural collaborator: you align engineers, product managers, and QA without creating overhead or slowing the team

- You write design documents that are read before coding starts, not filed after the fact

- Tier-I institute preferred (IIT / IIIT / NIT / IISC / BITS CSE / ISE)

Technology Stack :

Backend : Java, Kotlin, Spring Boot, Ktor, Go (exposure)

Messaging : Apache Kafka, gRPC, REST APIs, Avro/Protobuf

Storage : Redis, MySQL/Aurora, DynamoDB, Elasticsearch

AI Integration : REST/gRPC model endpoints, Feature flag SDKs, OpenTelemetry tracing

DevOps : Kubernetes, Docker, Helm, CI/CD (GitHub Actions/Jenkins), Terraform (awareness)

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in