- You spot the distributed systems edge case before it becomes a 2am incident.

- You add observability before you open the PR, not after the alert fires.

- Own features from design through production not just the implementation ticket

- Write clean, tested, documented code that your colleagues can extend without asking you for context

- You handle async failure modes, idempotency edge cases, and cache invalidation timing before they break

- Your code reviews are educational: you catch the bug and explain the distributed systems principle behind it

The Platform You're Joining :

- You are not joining a travel app.

- You are joining the engineering team building India's most intelligent travel commerce operating system a billion-dollar marketplace that connects 100M+ travellers to flights, hotels, buses, trains, and corporate travel every year.

Core Responsibilities :

Feature Engineering :

- Design and implement backend services and APIs: from API contract, data model, and service interaction design through to production deployment

- Write unit tests, integration tests, and consumer-driven contract tests covering not just happy paths but failure modes, timeouts, and malformed inputs

- Participate actively in design reviews: contribute specific, evidence-backed objections; offer design alternatives; challenge assumptions constructively

- Document your systems: API semantics, data model decisions, known failure modes, and retraining/cache-refresh schedules for any ML components you integrate

Distributed Systems Craft :

- Build services that handle partial failures gracefully: timeouts with sensible defaults, retries with exponential backoff and jitter, circuit breaker integration

- Implement idempotent operations: define idempotency key semantics, implement deduplication logic, test concurrent duplicate request scenarios

- Use Kafka correctly: appropriate partitioning, offset commit strategies, consumer group sizing, and handling of deserialization failures

- Use Redis correctly: key naming conventions, TTL strategy, atomic operations via Lua scripting where needed, and connection pool configuration

AI/ML Feature Integration :

- Integrate ranking model APIs: build the prediction request/response handling, implement timeout with heuristic fallback, add latency and error rate metrics

- Build recommendation display endpoints: retrieve personalised results from the recommendation service, assemble display payloads, handle partial failures

- Implement LLM-powered features: fare rule Q&A endpoints, hotel policy summarisation, complaint routing classifiers with output validation and hallucination guardrails

- Deliver Voice AI backend features: integrate ASR webhooks, parse spoken intent responses, manage conversational session state for multi-turn booking flows

- Instrument ML evaluation logging: capture prediction inputs, model outputs, and downstream user actions for offline model evaluation pipelines

Platform Work Across All Six Verticals :

- Platform

Engineering Problem :

What You'll Own & Build :

Flights :

- Itinerary pricing consistency across connection points

- Build the itinerary assembly service: fare combination rules, baggage policy aggregation, connection time validation, and price consistency checks

Hotels :

- Property detail cache refresh at checkout time

Build the property cache refresh service :

- TTL management, pre-fetch triggers on search events, and cache warm-up for trending properties

Bus :

- Boarding point geo-matching for route resolution

- Build the boarding/dropping point matching service: geo-proximity scoring, alias resolution, and route-stop normalisation across operator data models

Train :

- Berth preference allocation optimisation

- Build the berth preference assignment service: constraint-based allocation (lower berth preference, co-traveller grouping), quota-aware assignment, and preference conflict resolution

B2B :

- GST invoice line-item accuracy at high volume

- Build the invoice line-item service: booking-level GST computation, multi-GSTIN routing, credit note generation, and reconciliation against payment gateway settlements

Core :

- Distributed rate limiter for API protection

- Build the rate limiting service: sliding window counters (Redis), per-client quotas, burst allowance, and graceful backpressure to upstream callers

What Senior Quality Looks Like Here :

- Design before code: you write a lightweight ADR or sequence diagram before the first line even for 'small' features

- Observability from day one: you add metrics, traces, and structured log lines in the same PR as the feature not as a follow-up ticket

- You handle the async failure: what happens if the Kafka message arrives twice? if Redis is down? if the ML model times out Your code has answers

- Your PR description explains why, not just what so reviewers learn something, not just approve something

- You raise concerns about a flawed design in the review, not in the post-mortem: 'I noticed this pattern could cause a thundering herd under these conditions'

The Hard Engineering Problems You'll Face :

Across all six platforms, the engineering challenges are real, non-trivial, and consequential :

- Cache Invalidation at Speed

- Fare data has a 30-second freshness window.

- A stale cache hit in the booking flow means a pricing error, a failed checkout, or a lost trust signal.

- Multi-tier cache design (L1/L2/L3), TTL strategies, event-driven invalidation via Kafka, and cache stampede prevention are all live problems.

- Distributed Concurrency

- Train Tatkal opening: millions of concurrent writes for 72 berths per coach.

- Optimistic locking, distributed lease management, queue-based fairness, and atomic seat allocation without deadlock under pathological load.

- Event Ordering Guarantees

- A booking event must arrive before its payment event.

- But Kafka doesn't guarantee cross-partition ordering.

- Building booking state machines with idempotency, deduplication, and out-of-order event tolerance is a continuous engineering challenge.

AI-First Engineering Mandate :

- Platform Engineers at every level are responsible for building systems that AI and ML can run on and increasingly, systems that are AI themselves.

- ML Serving Infrastructure: your APIs must serve model predictions at p99 <20ms with graceful fallbacks you design the latency budget allocation

- Feature Pipeline Engineering: real-time feature computation (Kafka Streams, Flink) feeding the feature store at sub-second freshness

- RAG Backend Systems: vector store integration, embedding generation pipelines, document chunking and indexing for knowledge retrieval

- Agentic Workflow Infrastructure: durable execution systems (Temporal) for multi-step LLM agent workflows with retry and compensation logic

- Voice AI Backend: ASR request routing, low-latency TTS pipelines, spoken intent API design for conversational booking flows

- Recommendation API Design: serving infrastructure for collaborative filtering, session-based models, and personalised ranking endpoints

- Price Intelligence Pipelines: real-time competitive price ingestion, fare change event streaming, lower-price guarantee trigger systems

- A/B Experiment Infrastructure: feature flags, traffic splitting, metric collection, and experiment configuration systems

- MCP Tool Orchestration: building the tool-use APIs that LLM agents call to execute booking, modify, and cancel operations safely

Who You Are :

- 4 to 7 years in backend software engineering; production experience with microservices, REST APIs, and async messaging

- Strong in Java or Kotlin, comfortable with Spring Boot/Ktor, dependency injection, and service integration patterns

- Working knowledge of Kafka, Redis, and at least one cloud platform (AWS preferred)

- Curious about distributed systems: you read post-mortems, you understand why idempotency matters, you know what a split brain is

- ML-aware: you understand how to call a model endpoint safely, how to handle latency variance, and how to instrument prediction logging

- Degree in CS, Engineering, or equivalent; Tier-I institute is a plus

Technology Stack :

Backend : Java/Kotlin (preferred), Spring Boot, Ktor, Go (exposure welcomed)

Messaging : Apache Kafka, gRPC, REST APIs, Avro/Protobuf schemas

Storage : Redis, MySQL/Aurora, DynamoDB, Elasticsearch (basic)

AI Integration : REST/gRPC model endpoints, Feast SDK (online reads), Feature flag SDKs

Cloud/DevOps : AWS (EC2, EKS, S3, RDS), Docker, Kubernetes (basic), CI/CD pipelines

Observability : OpenTelemetry (SDK integration), Prometheus client, Structured logging (SLF4J)

Why This Matters :

The system you are building serves real people: a migrant worker booking a train home for Diwali, a startup founder booking her team's quarterly offsite, a student finding the cheapest bus to college.

The scale means your code decisions matter: a 5ms latency regression in the search API is felt by 50M users and a well-placed caching improvement reverses it for all of them.

The engineering culture is one where Senior is genuinely a respected craft role not a stepping stone that is immediately forgotten when you hit Staff.

You will grow fast here.

The problems are real, the feedback is direct, and the peers around you are some of the best engineers working on distributed systems in India today