- LLM-based Auto-schematization : Develop and refine LLM-based models and techniques for automatically inferring schemas from diverse unstructured and semi-structured public datasets and mapping them to a standardized vocabulary.

- Entity Resolution & ID Generation AI : Design and implement AI models for highly accurate entity resolution, matching new entities with existing IDs and generating unique, standardized IDs for newly identified entities.

- Automated Data Profiling & Schema Detection : Develop AI/ML accelerators for automated data profiling, pattern detection, and schema detection to understand data structure and quality at scale.

- Anomaly Detection & Smart Imputation : Create AI-powered solutions for identifying outliers, inconsistencies, and corrupt records, and for intelligently filling missing values using machine learning algorithms.

- Multilingual Data Integration AI : Develop AI assets for accurately interpreting, translating (leveraging automated tools with human-in-the-loop validation), and semantically mapping data from diverse linguistic sources, preserving meaning and context.

- Validation Automation & Error Pattern Recognition : Build AI agents to run comprehensive data validation tool checks, identify common error types, suggest fixes, and automate common error corrections.

- Knowledge Graph RAG/RIG Integration : Integrate Retrieval Augmented Generation (RAG) and Retrieval Augmented Indexing (RIG) techniques to enhance querying capabilities and facilitate consistency checks within the Knowledge Graph.

- MLOps Implementation : Implement and maintain MLOps practices for the lifecycle management of AI models, including versioning, deployment, monitoring, and retraining on a relevant AI platform.

- Code Generation & Documentation Automation : Develop AI tools for generating reusable scripts, templates, and comprehensive import documentation to streamline development.

- Continuous Improvement Systems : Design and build learning systems, feedback loops, and error analytics mechanisms to continuously improve the accuracy and efficiency of AI-powered automation over time.

Required Skills and Qualifications :

- Bachelor's or Master's degree in Computer Science, Artificial Intelligence, Machine Learning, or a related quantitative field.

- Proven experience (e.g., 3+ years) as an AI/ML Engineer, with a strong portfolio of deployed AI solutions.

- Strong expertise in Natural Language Processing (NLP), including experience with Large Language Models (LLMs) and their applications in data processing.

- Proficiency in Python and relevant AI/ML libraries (e.g., TensorFlow, PyTorch, scikit-learn).

- Hands-on experience with cloud AI/ML services.

- Understanding of knowledge representation, ontologies (e.g., Schema.org, RDF), and knowledge graphs.

- Experience with data quality, validation, and anomaly detection techniques.

- Familiarity with MLOps principles and practices for model deployment and lifecycle management.

- Strong problem-solving skills and an ability to translate complex data challenges into AI solutions.

- Excellent communication and collaboration skills.

Preferred Qualifications :

- Experience with data integration projects, particularly with large-scale public datasets.

- Familiarity with knowledge graph initiatives.

- Experience with multilingual data processing and AI.

- Contributions to open-source AI/ML projects.

- Experience in an Agile development environment.

Benefits :

- Opportunity to work on a high-impact project at the forefront of AI and data integration.

- Contribute to solidifying a leading data initiative's role as a foundational source for grounding Large Models.

- Access to cutting-edge cloud AI technologies.

- Collaborative, innovative, and fast-paced work environment.

- Significant impact on data quality and operational efficiency