Job Description :

Function : Data Science and Analysis - Data Science / Machine Learning

LLMs NLP Machine Learning Generative AI Python

As a Senior ML Engineer at Demandbase, you'll play a strategic role in building cutting-edge, production-level machine learning systems that drive deep technographic intelligence and high-impact business decisions. This role goes beyond conventional ML development and will architect and scale LLM-powered solutions that detect, classify, and map technographic signals from diverse sources (e. g., LinkedIn, SSL certs, blogs, and subdomains) to Demandbase's structured product catalog. You'll apply a combination of traditional ML, deep learning, and Large Language Models (LLMs) (e. g., LLaMA-3 Gemma, Mistral, GPT) to transform unstructured signals into actionable intelligence, pushing the boundaries of what's possible in entity resolution, product discovery, and dynamic catalog enrichment.

Responsibilities :

- Build scalable, production-ready ML + LLM hybrid systems for :

1. Technographic signal extraction.

2. Entity resolution from noisy data (e. g., LinkedIn job descriptions, SSL certs).

3. Product and category mapping via instruction-tuned LLMs and vector DBs.

- Use foundation models like LLaMA and Gemma for deep contextual understanding, enabling semantic inference of products, categories, and subcategories.

- Leverage RAG pipelines and prompt engineering to improve product detection and catalog alignment with minimal labeled data.

- Translate complex, ambiguous business needs into ML and LLM solution frameworks.

- Lead the model lifecycle from data ingestion, feature design, model development (ML/LLM), fine-tuning, to deployment and monitoring.

- Implement human-in-the-loop systems and catalog-aware feedback loops for continuous catalog enrichment and model refinement.

- Conduct multi-modal experiments using traditional ML and transformer-based LLMs for hybrid architectures.

- Optimize model performance using best-in-class techniques for :

1. Few-shot and zero-shot learning, Approximate nearest neighbors (ANN).

2. LLM-based reranking and metadata generation.

- Automate categorization of unknown/new tools (e. g., Turso, Appsmith) using LLM-based generalization and taxonomy mapping.

- Ensure robust production-grade model deployment using tools like :

1. Ray, Dask, Deepspeed (for scalable inference).

2. MLflow, Airflow, Feast, Kubeflow (for lifecycle management and monitoring).

3. Vector DBs (e. g., FAISS, Weaviate) for embedding-based matching.

- Integrate LLM-based services into high-throughput pipelines, ensuring low latency, scalability, and fault tolerance.

- Stay up to date with LLM advancements and experiment with new architectures (e. g., RAG, LLM agents, toolformer models).

- Contribute to Demandbase's AI strategy by identifying opportunities where LLMs can create product and customer impact.

- Prototype and evangelize the usage of LLM-based reasoning and categorization across product teams.

- Guide and mentor junior ML engineers and data scientists on LLM-centric design patterns, fine-tuning strategies, and deployment frameworks.

- Foster collaboration with product, engineering, and analytics to deliver unified, data-driven solutions.

Requirements :

- Bachelor's or Master's in Computer Science, Data Science, or related field.

- 8 - 12 years in data science/ML with at least 2+ years in LLM applications or GenAI projects.

- Demonstrated ability to design and productionize scalable ML systems that incorporate both deep learning and foundation models.

- LLM Expertise : Familiarity with LLaMA, Gemma, GPT-3.5/4 Mistral, and instruction-tuning or prompt engineering.

- ML Stack : TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy.

- Cloud & MLOps : AWS/GCP, Docker, Kubernetes, MLflow, Airflow, CI/CD.

- Data Handling : SQL, Spark, Dask, Feature Stores (e. g., Feast).

- Vector Matching : FAISS, Weaviate, embedding-based retrieval methods.

- Structured thinking with strong communication skills to influence cross-functional stakeholders.

- Curiosity and initiative to push boundaries using LLMs in innovative ways.

- Experience with entity resolution, technographic detection, or catalog generation.

- Hands-on with retrieval-augmented generation (RAG) pipelines and vector databases.

- Background in graph-based inference or taxonomic classification using LLMs.