Posted on: 15/08/2025
Role Overview :
As a Data Scientist, you will be at the forefront of designing, implementing, and optimizing Next Best Action (NBA) strategies using machine learning and state-of-the-art NLP techniques.
You will work closely with cross-functional teams including data engineers, domain SMEs, and business stakeholders to deliver actionable, data-driven insights and intelligent automation solutions.
Key Responsibilities :
- Build, deploy, and maintain machine learning models for predictive analytics, classification, clustering, and recommendation systems.
- Perform exploratory data analysis (EDA) on structured and unstructured datasets to uncover trends and behavioral patterns.
- Design and manage A/B testing frameworks to evaluate model performance and business impact.
- Develop pipelines for text vectorization and embeddings using Word2Vec, BERT, SBERT, or other transformer-based models.
- Implement Retrieval-Augmented Generation (RAG) workflows by integrating internal and external knowledge sources to enhance AI recommendations.
- Use frameworks like LangChain, Haystack, or custom code to build intelligent chatbot/assistant pipelines.
- Collaborate with data engineers to build data pipelines and ETL processes for model training and inference.
- Work with business teams to define Next Best Action strategies driven by ML/NLP outcomes.
- Present results and data-driven recommendations to technical and non-technical stakeholders.
- Continuously monitor and improve model performance through retraining, feedback loops, and new feature engineering.
Required Skills & Experience :
- Strong programming skills in Python and libraries such as pandas, NumPy, scikit-learn, seaborn.
- Experience with machine learning and statistical modeling techniques including linear regression, random forests, XGBoost, etc.
- Practical experience with NLP tasks text classification, Named Entity Recognition (NER), topic modeling, etc.
- Hands-on experience with embedding models like Word2Vec, BERT, SBERT, and transformer architectures.
- Knowledge of prompt engineering and working with LLMs (Large Language Models) such as OpenAI, Cohere, or similar.
- Proficiency in RAG (Retrieval-Augmented Generation) pipeline design and tools like LangChain, Haystack, or similar frameworks.
- Experience working with vector databases such as FAISS, Pinecone, Weaviate, or PostgreSQL with pgvector.
- Understanding of ETL/ELT pipelines, data transformation, and data cleaning techniques.
- Familiarity with SQL and data modeling.
- Exposure to cloud platforms such as AWS, GCP, or Azure
Did you find something suspicious?