Posted on: 09/01/2026
Description :
We are seeking an experienced and highly skilled Senior Data Scientist to join our team in Bengaluru. This role focuses on driving innovative, large-scale solutions using cutting-edge Classical Machine Learning, PySpark, Spark SQL, and Generative AI. The ideal candidate will possess a blend of deep technical expertise, strong business acumen, effective communication skills, and a sense of ownership. We require a proven track record in designing, developing, and real-time deploying scalable ML/DL pipelines and LLM Agents in a fast-paced, collaborative environment.
Responsibilities :
- Efficiently handle and model billions of data points using multi-cluster data processing frameworks (PySpark, Spark SQL).
- Expertise on Databricks is a must-have: Ability to design, write, scale, and monitorend-to-end ML Pipelines on Databricks.
- Proven expertise to run and manage Databricks data pipelines in real time for low-latency decision-making.
- Design and implement high-performance's using FastAPI in Python to expose real-time and batch ML pipelines.
- Design, implement, and deploy end-to-end ML/DL, GenAI solutions, writing modular, scalable, and production-ready code.
- Develop and implement scalable deployment pipelines using Docker and AWS services (ECR, Lambda, Step Functions).
- Design and implement custom models and loss functions to address data nuances and specific labelling challenges.
- Apply specialised modelling for marketing scenarios (Targeting, Budget optimisation, Churn) and data limitations (Sparse/incomplete labels, Single class learning).
- Leverage in-depth understanding of Transformer architectures and the principles of Large and Small Language Models.
- Practical experience in building LLM-ready Data Management layers for large-scale structured and unstructured data.
- Apply foundational understanding of LLM Agents and multi-agent systems (e. g., Agent-Critique, ReACT, Agent Collaboration), advanced prompting, LLM evaluation, confidence grading, and Human-in-the-Loop systems.
Requirements:
- Proficiency in Python and its data science ecosystem (NumPy, Pandas, Dask, PySpark) for large-scale data processing.
- Expert, hands-on experience with Databricks for MLOps, pipeline orchestration, and real-time deployment.
- Ability to perform effective feature engineering by understanding complex business objectives.
- In-depth knowledge of ANN, 1D/2D/3D Convolutional Neural Networks (ConvNets), LSTMs, and Transformer models.
- Strong proficiency in PU learning, single-class learning, and representation learning, alongside traditional ML approaches.
- Advanced understanding and application of model explainability techniques(e. g., SHAP, LIME).
- Hands-on experience with ML/DL libraries such as Scikit-learn, TensorFlow/Keras, and PyTorch.
- Experience utilising large-scale language models (GPT-4 Mistral, Llama, Claude) through prompt engineering and custom fine-tuning.
- Awareness of best software design practices and backend frameworks like Flask.
- Knowledge of Recommender Systems and advanced learning techniques (Representative learning, PU learning).
Did you find something suspicious?