HamburgerMenu
hirist

Process Nine Technologies - Lead Data Scientist - Artificial Intelligence/Machine Learning

Posted on: 10/07/2025

Job Description

Role : Lead Data Scientist (Multilingual AI/ML)

Location : Gurgaon, Haryana

Job Summary :


We are seeking an experienced and visionary Lead Data Scientist to spearhead our multilingual Artificial Intelligence and Machine Learning initiatives. This pivotal role focuses on advanced natural language processing, cutting-edge speech technologies, and the development of highly specialized, domain-specific AI/ML models. The ideal candidate will possess extensive hands-on experience in training and fine-tuning state-of-the-art transformer-based models, developing robust speech recognition and synthesis systems, and adapting large language models for complex, specialized applications. You will provide technical leadership, drive innovation, and mentor a talented team to deliver high-impact AI solutions.

Job Responsibilities :


- Lead the end-to-end development of multilingual transformer models, with a specific focus on applications such as machine translation using frameworks like OpenNMT, and cutting-edge Text-to-Speech (TTS) and Speech-to-Text (STT) systems leveraging tools like Coqui TTS.

- Conduct advanced fine-tuning of Whisper models for highly accurate and specialized speech-to-text transcription for diverse use cases and accents.

- Perform sophisticated fine-tuning of Large Language Models (LLMs) to adapt them for specific domains, particularly within the fintech sector, or other specialized business applications.

- Architect and build Agentic chatbots and complex conversational AI systems using advanced orchestration frameworks such as LangChain and LlamaIndex.

- Design and implement robust synthetic data pipelines to generate high-quality training data, particularly for low-resource languages or specialized domains.

- Develop and execute cross-lingual training strategies to enhance model performance and generalization across multiple languages.

- Mentor and guide junior and mid-level data scientists and ML engineers, fostering a culture of technical excellence, continuous learning, and best practices in AI/ML development.

- Collaborate closely with engineering teams to ensure efficient and scalable model deployment, monitoring, and MLOps practices in production environments.

- Stay abreast of the latest research and advancements in multilingual NLP, speech AI, and LLM technologies, integrating innovative approaches into our product strategy.

Desired Profile & Required Skills And Qualifications :


- Bachelor's or Master's degree in Computer Science, Machine Learning, Computational Linguistics, Statistics, or a related quantitative field.

- 5+ years of progressive experience in Machine Learning and Natural Language Processing (ML/NLP), with a proven track record of successful project delivery.

- 2+ years of demonstrated leadership experience in a data science or ML engineering team, including mentoring and guiding technical teams.

- Strong proficiency in Python for data manipulation, model development, and scripting.

- Expertise with leading deep learning frameworks such as PyTorch and/or TensorFlow.

- In-depth hands-on experience with the Hugging Face ecosystem, including Transformers, Tokenizers, and Datasets libraries.

- Proven experience with MLOps practices and principles, including model versioning, pipeline automation, and monitoring.

- Familiarity with containerization technologies such as Docker and orchestration platforms like Kubernetes for scalable ML deployments.

- Experience working with major cloud platforms (AWS, Google Cloud Platform, or Azure) and their respective AI/ML services.

- Strong understanding of data privacy and compliance requirements, especially within sensitive contexts like the fintech industry.

- Excellent analytical, problem-solving, and communication skills, with the ability to articulate complex technical concepts to diverse stakeholders.

Preferred Skills :


- Experience with other multilingual NLP toolkits or platforms.

- Familiarity with distributed training of large models.

- Publications in relevant AI/ML conferences or journals.

- Experience with other speech AI frameworks or technologies.

- Knowledge of graph databases or knowledge graph construction


info-icon

Did you find something suspicious?