We are seeking a high-calibre Applied Scientist to drive innovation in our core information retrieval capabilities.
This critical role demands deep expertise in building high-performance Machine Learning systems that enhance search relevance, retrieval efficiency, and user experience.
You will be responsible for pioneering the implementation of Vector Search, Hybrid Search, and LLM-powered RAG systems at scale.
Key Responsibilities & Strategic Deliverables :
1.Search & Ranking System Development :
- ML Pipeline Ownership: Design, build, and deploy robust, production-ready ML pipelines specifically for large-scale search and ranking applications.
- Advanced Retrieval: Lead the implementation and optimization of vector search, hybrid search, and advanced Learning-to-Rank (LTR) systems to maximize relevance and precision.
- Embedding Management: Drive the entire embedding lifecycle, including embedding generation using state-of-the-art models (e.g., BERT, Sentence Transformers) and managing high-scale embedding indexes using efficient libraries like FAISS, ScaNN, or Annoy.
Generative AI & Cloud Deployment :
- LLM Integration (RAG): Design and implement systems leveraging Large Language Models (LLMs) for advanced Retrieval-Augmented Generation (RAG), enabling more nuanced and conversational search results.
- Cloud Deployment: Deploy and manage scalable, low-latency solutions in a production environment using modern cloud services such as Vertex AI, Google Cloud Run, or Cloud Functions.
Evaluation, Testing & Optimization :
- Metrics & Evaluation: Define and rigorously evaluate models using industry-standard search relevance metrics: Precision@K, Recall, nDCG, Mean Average Precision (MAP), and related A/B testing frameworks.
- MLOps: Practice robust MLOps principles, including the management of CI/CD pipelines, model versioning, and continuous LLM optimization for cost and performance.
Required Skills & Technical Expertise :
- Programming & Big Data: Strong proficiency in Python, SQL, BigQuery, and PySpark for data manipulation and pipeline construction.
- Cloud & MLOps: Hands-on experience with Google Cloud Platform services, including Vertex AI, Matching Engine, and Dataproc.
- Search Infrastructure: Deep practical experience with enterprise search engines like ElasticSearch/OpenSearch and managing vector databases/stores.
Core Fundamentals (Mandatory) :
- Strong understanding of Vector Databases, Approximate Nearest Neighbor (ANN) algorithms, and core Search Relevance Metrics.
- Practical knowledge of transformer-based models (BERT, Sentence Transformers) and fine-tuning techniques