Posted on: 18/07/2025
We're building a next-generation voice intelligence platform-and we're looking for a Founding ML Engineer to take the lead on real-time, ultra-low-latency voice modeling, emotion-aware synthesis, and ML infrastructure.
You'll work shoulder-to-shoulder with the founding team (ex-FAANG, YC, and top AI labs), shaping everything from model design and inference pipelines to deployment and SDK integrations. You will not just contribute-you'll own the end-to-end ML system that powers high-impact consumer-facing voice applications.
Key Responsibilities :
- Optimize transformer-based models for real-time, streaming voice inference with ultra-low latency.
- Fine-tune and evaluate models for emotion detection, voice synthesis, and intonation control.
- Apply quantization, pruning, distillation, and custom CUDA kernels to reduce model overhead in production.
- Architect and implement high-performance streaming ML pipelines and real-time inference engines.
- Design and build SDKs and lightweight APIs to integrate voice ML into consumer-facing applications.
- Continuously profile, monitor, and refactor bottlenecks in training and inference pipelines.
- Work directly with founders on system architecture, deployment pipelines, and strategic priorities.
- Collaborate across disciplines (design, product, backend) to implement customer feedback and ship features fast.
- Lead end-to-end ML development, from experimentation and data curation to deployment and monitoring.
Your Background :
- 4-8 years of experience in machine learning, deep learning, or ML infrastructure roles.
- Strong proficiency in PyTorch, CUDA, and model optimization techniques.
- Prior experience with LLM/vLLM, SGLang, or similar inference engines for large models.
- Deep understanding of real-time streaming systems and working with audio/voice data.
- Experience building developer SDKs/APIs or ML tools that ship in production environments.
- Familiarity with Docker, Kubernetes, and cloud-native ML deployment.
- Comfortable wearing multiple hats: building infra, writing model code, profiling kernels, etc.
The job is for:
Did you find something suspicious?