HamburgerMenu
hirist

Job Description

Position : ML Engineer

Experience : 4-7 Years

Location : Bangalore, India


Job Summary :


We are seeking a highly skilled ML Engineer with 4-7 years of experience in building production systems that handle significant scale. The ideal candidate will have a deep, hands-on understanding of asynchronous and event-driven architectures and a proven track record of scaling AI/ML inference in production. This role requires a professional who can design and implement resilient, low-latency systems that manage AI workloads, integrate multiple models, and ensure a seamless user experience. You will be responsible for owning the end-to-end performance of critical, revenue-generating AI conversation flows.


Key Responsibilities :


System Design & Implementation : Design and implement robust, asynchronous multi-agent orchestration systems. Build resilient inference pipelines that can gracefully degrade under heavy load.


Latency & Performance Optimization : Own the end-to-end latency from a user's message to an AI response. Implement intelligent request routing and load balancing to optimize AI workloads. Optimize credit data retrieval and caching strategies to enhance system speed and efficiency.


Resilience & Reliability : Design and implement circuit breakers and fallback strategies for AI model failures. Migrate critical AI conversation flows from monolithic architectures to dedicated microservices to improve resilience and scalability.

Real-time Communication & Observability : Implement WebSocket/streaming infrastructure for real-time chat and other communication needs. Build comprehensive observability systems to monitor and analyze AI system performance.

Technical Leadership : Debug production issues under high AI inference load. Make technical decisions that directly affect revenue-generating conversations and customer subscription retention.


Required Skills & Qualifications :


Core Experience :

- 4+ years of experience building production systems that handle over 10k concurrent users.

- Proven, hands-on experience scaling ML/AI inference in production.

Mandatory Technical Skills :

- Proven experience with async/event-driven architectures, not just traditional REST APIs.

- Deep understanding of caching strategies using technologies like Redis, in-memory caches, or CDNs.

- Experience with message queues and real-time communication protocols.

- Proven experience building systems that integrate multiple LLM/AI models in production.

- Knowledge of AI model serving frameworks like TensorFlow Serving or Triton.

Professional Attributes :

- Strong problem-solving skills and experience debugging complex production issues under high load.

- A deep understanding of conversation state management and context handling.

- A mindset of ownership and a clear focus on technical decisions that drive business

outcomes.


Preferred Skills :


- Exposure to cutting-edge AI infrastructure challenges.


- Direct experience with optimization techniques like batching, caching, and model

quantization.

- Prior experience with AI-powered conversational platforms.


info-icon

Did you find something suspicious?