Posted on: 20/08/2025
AI Infrastructure Architecture :
- Design and implement asynchronous multi-agent orchestration
- Own end-to-end latency from user message to AI response
- Build resilient inference pipelines that gracefully degrade under load
- Implement intelligent request routing and load balancing for AI workloads
- Migrate critical AI conversation flow from monolith to dedicated services
- Implement WebSocket/streaming infrastructure for real-time chat
- Design circuit breakers and fallback strategies for AI model failures
- Build comprehensive observability for AI system performance
- Optimize credit data retrieval and caching strategies
Technical Requirements :
Must-Have Experience :
- Proven experience with async/event-driven architectures (not just REST APIs)
- Hands-on experience scaling ML/AI inference in production
- Deep understanding of caching strategies (Redis, in-memory, CDN)
- Experience with message queues and real-time communication protocols
AI-Specific Expertise :
- Experience with AI model serving frameworks (TensorFlow Serving, Triton, etc.)
- Understanding of AI inference optimization (batching, caching, model quantization)
- Knowledge of conversation state management and context handling
- Has debugged production issues under high AI inference load
Did you find something suspicious?