Posted on: 26/11/2025
Description :
Key Responsibilities :
- LLM-based Auto-schematization : Develop and refine LLM-based models and techniques for automatically inferring schemas from diverse unstructured and semi-structured public datasets and mapping them to a standardized vocabulary.
- Entity Resolution & ID Generation AI : Design and implement AI models for highly accurate entity resolution, matching new entities with existing IDs and generating unique, standardized IDs for newly identified entities.
- Automated Data Profiling & Schema Detection : Develop AI/ML accelerators for automated data profiling, pattern detection, and schema detection to understand data structure and quality at scale.
- Anomaly Detection & Smart Imputation : Create AI-powered solutions for identifying outliers, inconsistencies, and corrupt records, and for intelligently filling missing values using machine learning algorithms.
- Multilingual Data Integration AI : Develop AI assets for accurately interpreting, translating (leveraging automated tools with human-in-the-loop validation), and semantically mapping data from diverse linguistic sources, preserving meaning and context.
- Validation Automation & Error Pattern Recognition : Build AI agents to run comprehensive data validation tool checks, identify common error types, suggest fixes, and automate common error corrections.
- Knowledge Graph RAG/RIG Integration : Integrate Retrieval Augmented Generation (RAG) and Retrieval Augmented Indexing (RIG) techniques to enhance querying capabilities and facilitate consistency checks within the Knowledge Graph.
- MLOps Implementation : Implement and maintain MLOps practices for the lifecycle management of AI models, including versioning, deployment, monitoring, and retraining on a relevant AI platform.
- Code Generation & Documentation Automation : Develop AI tools for generating reusable scripts, templates, and comprehensive import documentation to streamline development.
- Continuous Improvement Systems : Design and build learning systems, feedback loops, and error analytics mechanisms to continuously improve the accuracy and efficiency of AI-powered automation over time.
Required Skills and Qualifications :
- Bachelor's or Master's degree in Computer Science, Artificial Intelligence, Machine Learning, or a related quantitative field.
- Proven experience (e.g., 3+ years) as an AI/ML Engineer, with a strong portfolio of deployed AI solutions.
- Strong expertise in Natural Language Processing (NLP), including experience with Large Language Models (LLMs) and their applications in data processing.
- Proficiency in Python and relevant AI/ML libraries (e.g., TensorFlow, PyTorch, scikit-learn).
- Hands-on experience with cloud AI/ML services.
- Understanding of knowledge representation, ontologies (e.g., Schema.org, RDF), and knowledge graphs.
- Experience with data quality, validation, and anomaly detection techniques.
- Familiarity with MLOps principles and practices for model deployment and lifecycle management.
- Strong problem-solving skills and an ability to translate complex data challenges into AI solutions.
- Excellent communication and collaboration skills.
Preferred Qualifications :
- Experience with data integration projects, particularly with large-scale public datasets.
- Familiarity with knowledge graph initiatives.
- Experience with multilingual data processing and AI.
- Contributions to open-source AI/ML projects.
- Experience in an Agile development environment.
Benefits :
- Opportunity to work on a high-impact project at the forefront of AI and data integration.
- Contribute to solidifying a leading data initiative's role as a foundational source for grounding Large Models.
- Access to cutting-edge cloud AI technologies.
- Collaborative, innovative, and fast-paced work environment.
- Significant impact on data quality and operational efficiency
Did you find something suspicious?