Description :

Role Summary :

We are hiring a Product Data Architect to own the data architecture and data foundations for a portfolio of strategic AI/ML products. This role focuses on designing and building product-grade data pipelines, curated data layers, and fit-for-purpose data stores that power analytics and AI/ML use cases.

You will define data models, data contracts, quality and governance controls, and access patterns to ensure product teams can reliably consume high-quality data at scale.

Key Responsibilities :

- Design the end-to-end data architecture for assigned products: ingestion transformation curated layers serving/consumption.

- Build and oversee data pipelines (batch/stream where needed), including orchestration, error handling, recovery, and performance optimization.

- Define product-level data models (conceptual/logical/physical), including dimensional models, canonical entities, and domain schemas.

- Establish data contracts with upstream/downstream systems and product services (schemas, SLAs, validation rules, versioning).

- Implement and enforce data quality and observability: checks, anomaly detection, freshness/completeness, reconciliation, and alerting.

- Define master/reference data needs and harmonization approaches for product-specific domains.

- Ensure secure and compliant data handling: access control, PII masking/redaction, encryption standards, retention, and auditability.

- Partner with Data Engineering, ML/AI teams, and Product/Tech leads to enable use cases such as forecasting, pricing optimization, RAG/knowledge bases, and experimentation.

- Evaluate and recommend data tooling choices at the product layer (e.g., transformations, orchestration, streaming, serving stores) aligned to scalability and cost.

Key Skills :

1) Product Data Architecture & Modelling

- Strong experience designing product-oriented data architectures: domain boundaries, source-to-consumption flows, and curated layers.

- Expertise in data modelling (dimensional, normalized, hybrid) and defining canonical datasets for product use cases.

- Ability to design data products: clearly defined datasets with ownership, contracts, documentation, and usage SLAs.

2) Data Pipeline & Lake/Lakehouse Design

- Hands-on architecture of data pipelines (batch and near-real-time): ingestion, transformation, orchestration, and serving.

- Strong understanding of data lake/lakehouse patterns: bronze/silver/gold, CDC-based ingestion, incremental processing, partitioning, and compaction strategies.

- Ability to define scalable approaches for data integration from enterprise systems (ERP/CRM/MarTech/R&D/LIMS/manufacturing systems, files, APIs, event streams).

3) Data Quality, Governance & Observability

- Proven capability to implement data quality frameworks: validation rules, anomaly detection, reconciliation, and completeness/accuracy checks.

- Strong understanding of metadata, lineage, and cataloging, and how to make data discoverable and trustworthy.

- Experience defining and enforcing data access controls: classification, role-based access, masking/tokenization, auditability.

4) Performance, Reliability & Cost-Aware Design

- Expertise in designing performant datasets and pipelines: partitioning, clustering, indexing, query optimization, and workload management.

- Ability to define operational standards for pipelines: retries, idempotency, backfills, monitoring, alerting, and incident response.

- Cost/performance tradeoff thinking for storage and compute (especially for large-scale transformation workloads).

5) Integration with AI/Analytics Consumption

- Strong understanding of downstream needs for BI/analytics, ML feature engineering, and AI applications (including GenAI/RAG where relevant).

- Ability to shape datasets for consumption: feature-ready tables, semantic layers, and curated marts for product teams.

6) Cross-Functional Delivery & Stakeholder Management

- Ability to work closely with product teams, data engineers, platform teams, and security/compliance to deliver on product timelines.

- Strong documentation and communication: data lineage, source mapping, data dictionaries, and pipeline runbooks.

Skills Required :

- Experience with modern orchestration and transformation tooling (e.g., Airflow/Prefect, dbt, or equivalents).

- Familiarity with one or more ecosystems commonly used in enterprise data platforms (e.g., Spark/Databricks, Snowflake/BigQuery, Delta/Iceberg/Hudi).

- Exposure to master data management, reference data management, and consent/PII governance programs.

- Domain exposure in CPG/FMCG, pricing/revenue management, marketing/media analytics, supply chain forecasting, or R&D systems.

Qualifications :

- Bachelors/Masters in Computer Science, Engineering, or related discipline (or equivalent practical experience).

- 612 years of experience in roles such as Data Architect, Analytics Architect, Data Engineering Lead, or Data Platform Architect with demonstrable ownership of data models and pipeline architectures for business-critical products.