Posted on: 10/07/2025
Role : Lead Data Scientist (Multilingual AI/ML)
Location : Gurgaon, Haryana
Job Summary :
We are seeking an experienced and visionary Lead Data Scientist to spearhead our multilingual Artificial Intelligence and Machine Learning initiatives. This pivotal role focuses on advanced natural language processing, cutting-edge speech technologies, and the development of highly specialized, domain-specific AI/ML models. The ideal candidate will possess extensive hands-on experience in training and fine-tuning state-of-the-art transformer-based models, developing robust speech recognition and synthesis systems, and adapting large language models for complex, specialized applications. You will provide technical leadership, drive innovation, and mentor a talented team to deliver high-impact AI solutions.
Job Responsibilities :
- Lead the end-to-end development of multilingual transformer models, with a specific focus on applications such as machine translation using frameworks like OpenNMT, and cutting-edge Text-to-Speech (TTS) and Speech-to-Text (STT) systems leveraging tools like Coqui TTS.
- Conduct advanced fine-tuning of Whisper models for highly accurate and specialized speech-to-text transcription for diverse use cases and accents.
- Perform sophisticated fine-tuning of Large Language Models (LLMs) to adapt them for specific domains, particularly within the fintech sector, or other specialized business applications.
- Architect and build Agentic chatbots and complex conversational AI systems using advanced orchestration frameworks such as LangChain and LlamaIndex.
- Design and implement robust synthetic data pipelines to generate high-quality training data, particularly for low-resource languages or specialized domains.
- Develop and execute cross-lingual training strategies to enhance model performance and generalization across multiple languages.
- Mentor and guide junior and mid-level data scientists and ML engineers, fostering a culture of technical excellence, continuous learning, and best practices in AI/ML development.
- Collaborate closely with engineering teams to ensure efficient and scalable model deployment, monitoring, and MLOps practices in production environments.
- Stay abreast of the latest research and advancements in multilingual NLP, speech AI, and LLM technologies, integrating innovative approaches into our product strategy.
Desired Profile & Required Skills And Qualifications :
- Bachelor's or Master's degree in Computer Science, Machine Learning, Computational Linguistics, Statistics, or a related quantitative field.
- 5+ years of progressive experience in Machine Learning and Natural Language Processing (ML/NLP), with a proven track record of successful project delivery.
- 2+ years of demonstrated leadership experience in a data science or ML engineering team, including mentoring and guiding technical teams.
- Strong proficiency in Python for data manipulation, model development, and scripting.
- Expertise with leading deep learning frameworks such as PyTorch and/or TensorFlow.
- In-depth hands-on experience with the Hugging Face ecosystem, including Transformers, Tokenizers, and Datasets libraries.
- Proven experience with MLOps practices and principles, including model versioning, pipeline automation, and monitoring.
- Familiarity with containerization technologies such as Docker and orchestration platforms like Kubernetes for scalable ML deployments.
- Experience working with major cloud platforms (AWS, Google Cloud Platform, or Azure) and their respective AI/ML services.
- Strong understanding of data privacy and compliance requirements, especially within sensitive contexts like the fintech industry.
- Excellent analytical, problem-solving, and communication skills, with the ability to articulate complex technical concepts to diverse stakeholders.
Preferred Skills :
- Experience with other multilingual NLP toolkits or platforms.
- Familiarity with distributed training of large models.
- Publications in relevant AI/ML conferences or journals.
- Experience with other speech AI frameworks or technologies.
- Knowledge of graph databases or knowledge graph construction
Did you find something suspicious?