Lead Data Engineer, Snowflake & Data Vault 2.0 (AI/ML & Snowpark Containers)

Job Summary :

We are seeking a highly experienced Data Engineer (5+ years) with mandatory expertise in Snowflake architecture and Data Vault 2.0 implementation. This role is crucial for designing and optimizing IoT data ingestion pipelines (Streams, Snowpipe, Tasks) and deploying advanced ML models directly via Snowpark. Key responsibilities include building a Data Vault 2.0 structure, developing CI/CD native containerized applications on Snowpark Container Services (SPCS), ensuring seamless replication across environments (Dev/Prod), and integrating technical components to deliver final AI and Generative AI functionality. This is a hands-on coding role requiring expertise beyond traditional ETL, focusing on complex pipeline orchestration and data structure alignment for machine learning.

Key Responsibilities and Technical Deliverables :

Data Vault 2.0 Architecture & Development :

- Implement and build the data structure from the ground up using the Data Vault 2.0 methodology (Hubs, Links, Satellites), leveraging Snowflake's architecture.

- Design and develop specific data pipeline components in Snowflake, utilizing advanced features like Snowpipe, Snowflake Streams, and Tasks for automated ingestion and transformation.

- Establish a robust replication approach required across environments (development, production) and for data structure changes to support continuous deployment.

Snowflake IoT Data Pipelines (Batch + Streaming) :

- Design, implement, and optimize IoT data ingestion pipelines for both raw sensor telemetry and third-party API integration using Snowflake Streams, Snowpipe, and Tasks with potential OpenFlow orchestration.

- Implement and support SCD (Slowly Changing Dimension) Type 1, Type 2, and Type 3 pipelines within the Data Vault structure for incremental and historical data processing.

Machine Learning Operations (Snowpark + Cortex AI Integration) :

- Deploy, monitor, and optimize ML models (e.g., fouling detection, fuel prediction, itinerary optimization) directly in the database using Snowpark, ensuring model inference and scoring are performed efficiently.

- Assist with Cortex AI services for advanced analytics, semantic enrichment, and leveraging native Generative AI functionality in the data flow.

- Must be able to integrate and code along the way to ensure AI functionality at the end.

Containerized Applications on Snowflake SPCS :

- Build and deploy CI/CD native containerized applications in Snowflake SPCS (Snowpark Container Services) (e.g., custom APIs, routing optimizers).

- Manage the application lifecycle, including packaging, scaling, monitoring, and troubleshooting containerized services integrated with Snowflake data assets.

Governance and Alignment :

- Document processes and provide reproducible, automated deployment standards across all environments (Dev/Prod).

- Manage technical permissions, roles, and cross-account access provisioning in collaboration with governance teams (EDM) and cloud vendors.

- Troubleshooting previous issues for ML or Cortex based applications among end users and ensuring source data access alignment.

Mandatory Skills & Qualifications (Inferred)

- Core Expertise : Deep, hands-on experience in Snowflake (architecture, performance, features).

- Data Modeling : Mandatory expertise in Data Vault 2.0 methodology and implementation.

- Snowflake Tools : Proficiency with Snowpipe, Snowflake Streams, Tasks, and Snowpark.

- ML/AI Integration : Experience deploying and integrating Machine Learning models directly within the database environment.

- Development : Strong ability to integrate and code complex logic (SQL, stored procedures, UDFs, Python/Scala via Snowpark).

Preferred Skills :

- Hands-on experience with Snowpark Container Services (SPCS) and container orchestration/CI/CD (e.g., Docker, Kubernetes).

- Knowledge of Cortex AI services or other Generative AI integration techniques.

- Experience with cloud orchestration tools (e.g., OpenFlow, Airflow) for managing pipelines.

- Experience implementing SCD Type 2/3 and data replication strategies across environments.

- Strong background in IoT data ingestion and processing of raw sensor telemetry.