HamburgerMenu
hirist

Observe.AI - Senior Machine Learning Engineer - Deep Learning

Posted on: 14/07/2025

Job Description

About Observe.AI :

Observe.AI is redefining customer service with software-first, AI-driven voice solutions. Our platform combines advanced Speech AI, Generative AI, and automation to empower enterprises to scale high-quality support, reduce operational costs, and unlock actionable insights from every interaction.

Trusted by industry leaders such as Accolade, Prudential, Cox Automotive, and Concentrix, our voice AI platform powers millions of interactions daily with enterprise-grade performance, security, and reliability.


Role Overview :


As a Senior Machine Learning Engineer Speech, you will build and scale production-grade software systems for speech recognition, synthesis, and enhancement. You will focus on optimizing deep learning models for real-time streaming, robustness in noisy environments, and seamless integration into Observe.AIs distributed platform. Youll work at the intersection of applied research, ML engineering, and distributed systems to deliver intelligent, scalable speech capabilities to our customers.


Key Responsibilities :

- Design, train, and deploy ASR, TTS, and speech enhancement models into real-time inference pipelines.

- Build software modules to integrate speech models with production services using gRPC, REST, or streaming protocols.

- Own the end-to-end lifecycle of ML models: data curation, training pipelines, model evaluation, optimization, deployment, and monitoring.

- Implement signal processing algorithms to improve speech clarity, noise suppression, and prosody control.

- Benchmark model performance (latency, memory, WER/MOS scores) and optimize for speed and cost using quantization/pruning/distillation.

- Collaborate with infrastructure teams to scale ML systems on cloud-native architectures (Kubernetes, Docker, microservices).

- Stay current with the latest speech research and prototype new architectures (e.g., Conformer, Transducer, Whisper, NeMo) for production testing.

- Build internal tooling for model experimentation, evaluation, A/B testing, and error analysis.


Technical Skillset :


Programming & Frameworks :


- Proficient in Python for ML development and scripting

- Experience with PyTorch, TensorFlow, or JAX for deep learning

- Strong software engineering fundamentals with C++ or Rust (for optimized audio modules)

Speech & Audio Technologies :

- Experience with ASR (e.g., wav2vec 2.0, Whisper, NeMo), TTS (e.g., Tacotron2, FastSpeech2, HiFi-GAN), and speech enhancement

- Familiarity with Kaldi, ESPnet, DeepSpeech, WebRTC for VAD/denoising

- Audio preprocessing using Librosa, SoX, SoundFile, NumPy, SciPy

Model Optimization & Serving :

- Knowledge of ONNX, TensorRT, TorchScript for inference acceleration

- Experience with quantization, pruning, knowledge distillation

- Model deployment using Triton Inference Server, FastAPI, gRPC, Flask

Infrastructure & Deployment :

- CI/CD for ML pipelines using MLflow, Kubeflow, Weights & Biases

- Containerization and orchestration: Docker, Kubernetes

- Scalable training and serving on AWS, GCP, or Azure

Data & Monitoring :

- Handling large-scale audio datasets; audio augmentation (SpecAugment, speed perturbation)

- Model performance monitoring in production (latency, drift detection, SNR, MOS)


What You Bring :

- 3+ years of hands-on experience building and shipping ML-driven speech products.

- Deep understanding of audio signal processing and neural architectures for voice.

- Experience with production ML systems, model life cycle management, and observability.

- Ability to write clean, testable, modular code following software engineering best practices.

- Comfort working in agile, cross-functional teams and contributing to scalable architecture decisions.


info-icon

Did you find something suspicious?