We are seeking a highly skilled ML Engineer with 4-7 years of experience in building production systems that handle significant scale. The ideal candidate will have a deep, hands-on understanding of asynchronous and event-driven architectures and a proven track record of scaling AI/ML inference in production. This role requires a professional who can design and implement resilient, low-latency systems that manage AI workloads, integrate multiple models, and ensure a seamless user experience. You will be responsible for owning the end-to-end performance of critical, revenue-generating AI conversation flows.

Key Responsibilities :

System Design & Implementation : Design and implement robust, asynchronous multi-agent orchestration systems. Build resilient inference pipelines that can gracefully degrade under heavy load.

Latency & Performance Optimization : Own the end-to-end latency from a user's message to an AI response. Implement intelligent request routing and load balancing to optimize AI workloads. Optimize credit data retrieval and caching strategies to enhance system speed and efficiency.

Resilience & Reliability : Design and implement circuit breakers and fallback strategies for AI model failures. Migrate critical AI conversation flows from monolithic architectures to dedicated microservices to improve resilience and scalability.

Real-time Communication & Observability : Implement WebSocket/streaming infrastructure for real-time chat and other communication needs. Build comprehensive observability systems to monitor and analyze AI system performance.

Technical Leadership : Debug production issues under high AI inference load. Make technical decisions that directly affect revenue-generating conversations and customer subscription retention.

Required Skills & Qualifications :

Core Experience :

- 4+ years of experience building production systems that handle over 10k concurrent users.

- Proven, hands-on experience scaling ML/AI inference in production.

Mandatory Technical Skills :

- Proven experience with async/event-driven architectures, not just traditional REST APIs.

- Deep understanding of caching strategies using technologies like Redis, in-memory caches, or CDNs.

- Experience with message queues and real-time communication protocols.

- Proven experience building systems that integrate multiple LLM/AI models in production.

- Knowledge of AI model serving frameworks like TensorFlow Serving or Triton.

Professional Attributes :

- Strong problem-solving skills and experience debugging complex production issues under high load.

- A deep understanding of conversation state management and context handling.

- A mindset of ownership and a clear focus on technical decisions that drive business

outcomes.

Preferred Skills :

- Exposure to cutting-edge AI infrastructure challenges.

- Direct experience with optimization techniques like batching, caching, and model

quantization.

- Prior experience with AI-powered conversational platforms.

Did you find something suspicious?

Posted By

Akash Tyagi

Recruitment Manager at Success Pact Consulting Pvt Ltd

Last Active: 15 Oct 2025

Job Views:
55

Applications: 16

Recruiter Actions: 0

Posted in

AI/ML

Functional Area

ML / DL Engineering

Job Code

1542122

Jobs by location

Interview Questions for you

View All

Top 25 LLM Interview Questions and Answers

Top 50+ GitHub Interview Questions and Answers

Top 25+ Database Testing Interview Questions and Answers