Role Overview :

The Data Engineer/Knowledge Graph Specialist is a foundational role responsible for building and maintaining the proprietary Causal Knowledge Graph the memory and context core of the Hierarchical Task Planning (HTP) agentic framework. This role involves turning the messy, heterogeneous data sourced from acquired portfolio companies (APCs) into a structured, relational knowledge model required for the Strategic Agents complex reasoning (MCTS).

Key Responsibilities :

1. Data Ingestion & ETL Pipeline Development

- Pipeline Construction: Design, build, and optimize robust and auditable ETL/ELT data pipelines (using tools like Airflow or Python libraries) to ingest financial reports, operational logs (ERP/CRM) and external data sources from APCs.

- Data Cleansing & Validation: Implement standardized cleansing rules and validation checks to ensure data accuracy and prepare raw data for knowledge graph insertion.

2. Knowledge Graph Modeling and Population

- Graph Schema Design: Design and iterate on the optimal graph schema (nodes, relationships, and properties) to represent the Causal Model linking operational levers (e.g., Inventory Turns) to financial outcomes (e.g., COGS).

- Graph Population: Write efficient code to transform flat files, relational data (SQL tables, spreadsheets) into interconnected graph structures using graph query languages.

- Maintenance: Monitor the Knowledge Graph's performance and manage versioning.

3. Cross-Functional Collaboration

- Work closely with the Senior AI Engineer to define the data structures and formats required by the MCTS.

- Provide clean, structured and secure data access to the rest of the team for Agent actions, ML model fine-tuning and evaluations.

Required Skills and Qualifications :

- Graph Databases: Expert-level proficiency in at least one major Graph Database technology (e.g., Neo4j, TigerGraph, AWS Neptune), including graph data modeling and query languages (e.g., Cypher).

- Core Data Engineering: 3+ years of experience with building and managing production-level data pipelines (Python, SQL). Cloud & Tools: Proficiency in one major Cloud platform (AWS/GCP/Azure) and experience with containerization (Docker). Familiarity with workflow orchestrators (e.g., Apache Airflow, Prefect).

- Data Transformation: Strong skills in transforming heterogeneous and unstructured data (PDFs, JSON, complex Excel reports) into structured formats.

- Programming: Mastery of Python, proficiency with advanced data engineering libraries.

Desirable Domain Experience :

- Financial Data: Familiarity with standard financial document structures (P&L, Balance Sheets, Trial Balance) is a strong plus.

- ETL in Acquisitions: Experience working with messy, legacy data systems, typical in M&A or acquired companies.