HamburgerMenu
hirist

Senior AI Engineer - Speech Recognition

SingleInterface
5 - 7 Years
Delhi NCR

Posted on: 14/04/2026

Job Description

About the Role :

We are hiring a Senior AI Engineer, Voice AI to build state-of-the-art voice AI systems for real-world customer conversations. The ideal candidate has deep experience across speech and language layers, including ASR, diarization, multilingual voice pipelines, LLM prompting and optimization, summarization, extraction, and low-latency production deployment.

This is not a pure integration role. We are looking for someone who can operate close to the model and systems layer, make strong architectural trade-offs, and build reliable conversational AI under real-world constraints such as noisy telephony audio, multilingual users, interruptions, latency sensitivity, and business-critical workflows.

You will work on production voice agents and conversational intelligence systems that power customer interactions at scale, across live conversations, post-call analytics, and automation workflows.

Key Responsibilities :

- Design, build, and improve real-time Voice AI systems for customer conversations.

- Develop and optimize systems across the voice stack, including :

1. Automatic speech recognition (ASR)

2. Speaker diarization

3. Voice activity detection (VAD)

4. Language identification

5. LLM reasoning and orchestration

6. Response generation

7. Text-to-speech (TTS)

8. Interruption handling and turn-taking

- Build low-latency, production-grade inference pipelines for live voice interactions.

- Improve multilingual and code-switched conversational performance for real-world telephony environments.

- Design prompt workflows, structured outputs, tool usage, and fallback logic to improve accuracy, control, and resolution rates.

- Build and refine systems for :

- Summarization

- Information extraction

- Classification

- Sentiment and emotion analysis

- Call dispositioning

- Topic and intent detection

- Quality monitoring

- Define and implement evaluation frameworks for conversational quality, latency, hallucination, containment, task completion, extraction accuracy, and business outcomes.

- Fine-tune and optimize LLMs and speech models for domain-specific performance where needed.

- Work closely with product, engineering, and operations teams to translate ambiguous business problems into scalable AI systems.

- Troubleshoot production failures across speech, orchestration, and model behavior, and drive measurable improvements.

Required Qualifications :

- 5+ years of experience in AI/ML/NLP, with strong hands-on experience in Voice AI, Speech AI, or Conversational AI.

- Proven experience building or improving production voice systems used in real customer interactions.

- Strong understanding of the speech pipeline, including one or more of :

1. ASR

2. diarization

3. VAD

4. language identification

4. TTS

5. Telephony audio handling

- Strong experience with LLMs in production, including prompting, orchestration, evaluation, summarization, extraction, or agent workflows.

- Strong programming skills in Python and experience with frameworks such as PyTorch, TensorFlow, or similar.

- Experience with model serving, inference optimization, and production performance tuning.

- Strong grasp of system design trade-offs across latency, quality, reliability, and cost.

- Ability to work hands-on across experimentation, shipping, debugging, and iterative improvement.

Preferred Qualifications :

- Experience working in contact center AI, conversational analytics, or enterprise customer interaction products.

- Experience with multilingual speech systems, especially for Indian or other non-English language environments.

Familiarity with tools, frameworks, or models such as :

- Whisper

- Kaldi

- Hugging Face

- Streaming ASR pipelines

- Neural TTS systems

Experience with :

- Prompt optimization

- Fine-tuning

- PEFT / LoRA

- Distillation

- Evaluation harnesses

- Structured generation

- Experience designing systems for :

- Call summarization

- QA automation

- Compliance monitoring

- Agent assist

- Sentiment/emotion detection

- Conversation intelligence

- Familiarity with production infrastructure such as Docker, Kubernetes, Redis, message queues, observability tooling, and scalable API services.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in