Posted on: 10/07/2025
Job Title : AI/ML Engineer - Voice Synthesis & Large Language Models
Location : Gurugram (Work from office only)
Experience Level : 3-5 years
Employment Type : Full-time
Company Overview :
We are a cutting-edge AI startup specializing in advanced voice synthesis and large language model applications. Our focus includes neural voice synthesis, text-to-speech systems, voice cloning, and LLM-powered conversational AI. We are driven by innovation and a passion for creating impactful AI solutions that transform human-computer interaction through natural voice and language technologies.
Role Overview :
We are seeking an experienced AI/ML Engineer with specialized expertise in voice synthesis models, large language models, and phonetic processing systems. The ideal candidate should have proven experience in fine-tuning voice models, developing custom phonemizers, and working with phonetic representations for speech synthesis. This role requires deep understanding of both the acoustic and linguistic aspects of voice generation technologies.
Key Responsibilities :
- Voice Synthesis Development: Design, implement, and optimize neural voice synthesis models including WaveNet, Tacotron, FastSpeech, and other state-of-the-art TTS architectures.
- Voice Model Fine-tuning: Lead fine-tuning initiatives for pre-trained voice synthesis models, including speaker adaptation, domain-specific customization, and multi-speaker model development.
- Phonemizer Development: Design and implement custom phonemizers for various languages, develop phonetic alignment algorithms, and optimize phoneme-to-speech mapping for improved synthesis quality.
- LLM Integration: Develop and deploy large language model solutions, fine-tune pre-trained models, and integrate LLMs with voice synthesis pipelines for conversational AI applications.
- Phonetic Processing: Develop and optimize text-to-phoneme conversion systems, implement grapheme-to-phoneme models, and work with linguistic features for enhanced voice synthesis quality.
- Model Optimization: Implement model compression techniques, quantization, and optimization strategies for real-time voice synthesis and LLM inference.
- Research & Development: Stay current with latest advancements in voice synthesis research, phonetic modeling, transformer architectures, and emerging LLM techniques. Contribute to R&D initiatives in neural audio generation.
- Data Pipeline Management: Design and maintain data preprocessing pipelines for speech datasets, phonetic annotations, text corpora, and multimodal training data.
- Performance Engineering: Ensure low-latency, high-quality voice synthesis in production environments and optimize LLM response times for real-time applications.
- Cross-functional Collaboration: Work closely with product teams to integrate voice synthesis and LLM capabilities into user-facing applications.
Requirements :
Core Technical Skills :
- 3-5 years of experience in AI/ML development with specific focus on voice synthesis models and/or large language models
- Proven experience in fine-tuning voice synthesis models including speaker adaptation, voice cloning, and domain-specific customization
- Hands-on experience with phonemizers and phonetic processing including grapheme-to-phoneme conversion, phonetic alignment, and linguistic feature engineering
- Strong proficiency in Python and deep learning frameworks: PyTorch, TensorFlow, Hugging Face Transformers
- Deep understanding of neural voice synthesis architectures (WaveNet, Tacotron, FastSpeech, VITS, StyleTTS, or similar)
- Experience with large language models (GPT, BERT, T5, LLaMA, or similar) including fine-tuning and deployment
- Knowledge of transformer architectures and attention mechanisms
Specialized Experience :
- Experience with phonetic processing tools (espeak-ng, Festival, MaryTTS, Phonemizer) and phonetic alphabets (IPA, ARPABET, X-SAMPA)
- Voice model fine-tuning expertise including transfer learning, few-shot speaker adaptation, and multi-speaker model training
- Knowledge of linguistic processing including morphology, phonology, and prosodic modeling
- Experience with speech processing libraries (librosa, torchaudio, speechbrain, Praat)
- Knowledge of audio signal processing, spectral analysis, and acoustic modeling
- Experience with voice cloning, speaker verification, and speaker adaptation techniques
- Familiarity with prompt engineering and LLM optimization techniques
- Understanding of neural vocoder architectures and audio generation models
- Experience with phonetic alignment tools (MFA, P2FA, gentle) and forced alignment techniques
Technical Infrastructure :
- Experience with cloud platforms (AWS, Google Cloud, Azure) for ML model deployment
- Knowledge of model serving frameworks (TensorFlow Serving, TorchServe, FastAPI)
- Experience with containerization (Docker, Kubernetes) for ML workflows
- Proficiency with version control systems (Git) and MLOps practices
Soft Skills :
- Excellent problem-solving skills with ability to debug complex neural network architectures
- Strong communication skills for presenting technical concepts to diverse audiences
- Ability to work in fast-paced environment and manage multiple research and development projects
- Collaborative mindset for working in cross-functional teams
Preferred Qualifications :
- Advanced experience with custom phonemizer development for low-resource or multilingual applications
- Publications or contributions in voice synthesis, phonetic modeling, or speech processing research
- Experience with real-time audio processing and streaming applications
- Knowledge of multimodal AI systems combining text, speech, and other modalities
- Experience with distributed training and large-scale model deployment
- Contributions to open-source voice synthesis, phonetic processing, or LLM projects
- Understanding of ethical AI practices in voice synthesis and language models
- Experience with multilingual phonetic processing and cross-lingual voice synthesis
- Knowledge of prosodic modeling and emotion/style transfer in voice synthesis
Why Join Us :
- Opportunity to work on cutting-edge voice synthesis and LLM technologies that are shaping the future of human-computer interaction
- Access to state-of-the-art computing resources and research opportunities
- Collaborative environment with leading experts in voice AI and language modeling
- Direct impact on product development and user experience innovation
- Professional growth opportunities in emerging AI technologies
Did you find something suspicious?