HamburgerMenu
hirist

Data Scientist - Generative AI

InnovationM
Multiple Locations
2 - 6 Years
star-icon
4white-divider266+ Reviews

Posted on: 03/11/2025

Job Description

Description :



Mandatory Skills :


- Generative AI (LLMs, RAG, Fine-tuning)

- Open Source LLM

- Pandas

- Python

- Natural Language Processing (NLP)

- LangChain

- Vector Databases (e.g., Pinecone)

- Microservices

- CrewAI / Agentic AI

- Topic Modeling / Clustering



Role and Responsibilities :


- Develop and fine-tune LLMs using techniques like RAG, transfer learning, or domain-specific adaptation


- Build AI agents using frameworks like LangChain or CrewAI to manage dynamic and multi-step workflows

- Work with vector databases (e.g., Pinecone) to enable semantic search and retrieval

- Design and maintain ETL pipelines and ensure smooth data preprocessing and transformation

- Implement NLP solutions for tasks like intent detection, sentiment analysis, and content generation

- Develop and integrate AI Voice Agents capable of handling natural, conversational interactions including voice input/output, real-time speech synthesis, and intent recognition using LLM-based frameworks


- Develop backend APIs and services using Python frameworks like FastAPI or Flask

- Contribute to scalable microservice-based architectures.



Requirements :


- Bachelor's degree in Computer Science, Information Technology, or a related field.

- 3 to 5 years in AI/ML development and backend system

- Machine Learning Fundamentals : Strong grasp of algorithms, model training, evaluation, and tuning


- Generative AI Models : Experience working with LLMs, RAG architecture, and fine-tuning techniques


- LangChain or Similar Frameworks : Hands-on experience building AI workflows using toolkits like LangChain

- Natural Language Processing (NLP) : Proficiency in text analytics, classification, tokenization, embeddings

- Vector Databases:

- Practical use of tools like Pinecone, FAISS, or similar for retrieval-augmented generation

- Big Data Handling : Ability to work with large datasets, optimize storage, and processing pipelines

- SQL/NoSQL : Experience in querying and managing structured and unstructured data

- Python & API Development : Proficiency in Python and frameworks like FastAPI or Flask

- ETL & Data Preprocessing : Strong understanding of building pipelines for clean and efficient data processing

Soft Skills : Strong problem-solving, communication, and collaboration abilities.



Good-to-Have Skills :


- Agentic AI Tools : Exposure to CrewAI or similar platforms for orchestrating multi-agent interactions

- Content Structuring : Experience in clustering, topic modeling, or organizing unstructured data

- ETL Enhancements : Advanced optimization techniques for faster and more efficient pipelines

- Domain Exposure : Prior work on projects involving customer insights, chat summarization, or sentiment analysis

- AI Voice Agent Development : Hands-on experience with speech-to-text (ASR), text-to-speech (TTS), and conversational voice interfaces leveraging frameworks like OpenAI Whisper, SpeechBrain, or AWS Lex/Polly


info-icon

Did you find something suspicious?