We are seeking a highly experienced Senior Data Architect with a proven track record of designing and implementing Medallion Architecture (Bronze Silver Gold) from the ground up using modern, open source and cloud-based data technologies.

This individual will be responsible for architecting large scale data ingestion, streaming, transformation, and storage solutions utilizing Azure Data Lake Storage Gen2, Apache Kafka/Event Hubs (Kafka protocol), Schema Registry, Debezium/Kafka Connect, Spark Structured Streaming, and industry-grade reliability patterns.

The ideal candidate combines deep hands-on expertise with strong architectural vision to build scalable, secure, and high-performance data ecosystems that serve analytics, AI/ML, and operational workloads.

Key Responsibilities :

1. Data Architecture & Platform Ownership :

- Architect and deliver Enterprise-grade Medallion Architecture (Bronze ? Silver ? Gold) from scratch, ensuring modularity, scalability, data quality pipelines, and lineage across layers.

- Define standards for data ingestion, curation, cleansing, enrichment, consumption, and metadata management.

- Develop logical, physical, and canonical data models for structured, semi structured, and streaming use cases.

- Establish robust patterns for ACID compliant transformations and incremental processing across layers.

2. Data Lake & Storage Architecture (ADLS Gen2) :

- Design and manage complex data lake zones with hierarchical namespaces, folder strategies, and partitioning standards.

- Implement data governance (RBAC/ACLs), encryption, lifecycle policies, and cost optimized storage tiers.

3. Streaming Architecture & Real time Data Pipelines :

- Architect and optimize Apache Kafka / Event Hubs (Kafka protocol) clusters, including topic strategy, retention, compaction, partitions, consumer/producer design, and offset management.

- Implement CDC pipelines using Debezium and Kafka Connect, enabling near real time ingestion from RDBMS and other source systems.

- Use Schema Registry (Avro/Protobuf/JSON Schema) to enforce schema evolution, compatibility, and governance.

- Define Dead Letter Queue (DLQ) patterns, retry handling, poison message strategies, and high reliability delivery.

- Implement exactly-once and idempotent processing patterns across producers, consumers, and downstream processors.

4. Big Data Processing & Distributed Compute (Apache Spark) :

- Lead design and optimization of Spark batch and streaming pipelines using Spark SQL, DataFrames, and Structured Streaming.

- Implement efficient transformations adhering to best practices like predicate pushdown, partition pruning, broadcast joins, caching, and minimizing shuffles.

- Tune cluster configuration, autoscaling, executor/driver memory, and shuffle behavior for cost-effective performance.

5. Integration, APIs & Enterprise Patterns :

- Integrate lakehouse architecture with upstream source systems, downstream consumption systems, BI platforms, and ML pipelines.

- Establish pipelines for metadata capture, data lineage, quality checks, SCD handling, and anomaly detection.

- Drive API-based ingestion/extraction models, including REST, gRPC, and streaming APIs.

6. Security, Governance & Observability :

- Define and implement robust data governance : data classification, cataloging, masking, and access policies.

- Implement observability across data pipelines : logging, metrics, tracing, alerts, SLA/SLO tracking, and incident response playbooks.

- Ensure compliance with enterprise security policies and regulatory standards.

7. Leadership, Collaboration & Strategy :

- Act as the primary data architecture authority, guiding engineering teams in implementation and best practices.

- Collaborate with business stakeholders, product owners, and engineering leaders to define data strategy, roadmaps, and architectural direction.

- Mentor data engineers on design patterns, code quality, optimization, and cloud-native data engineering.

Required Skills & Experience :

Core Architectural Expertise :

- Demonstrated experience designing Medallion Architecture (Bronze/Silver/Gold) from scratch in an enterprise environment.

- Strong background in distributed data architecture, lakehouse patterns, and event driven systems.

Cloud & Data Lake :

- Expert-level knowledge of ADLS Gen2, folder structuring, access control, partitioning strategies, and storage lifecycle management.

Streaming & Messaging :

- Deep experience with Apache Kafka, including :

- Topic modeling, partitioning, replication, retention, compaction

- Producer/consumer design, consumer groups, lag monitoring

- Schema Registry usage and schema evolution

- Idempotent writes and exactly once semantics

- DLQ and replay strategies

- Strong experience with Azure Event Hubs (Kafka protocol).

CDC & Middleware :

- Practical experience using Debezium and Kafka Connect for source connectors, sink connectors, CDC strategies, and operationalizing connectors.

Big Data Processing :

- Hands-on expertise with Apache Spark : DataFrames, Spark SQL, Structured Streaming, Delta style incremental pipelines.

- Advanced Spark performance tuning and troubleshooting skills.

Programming & Tools :

- Strong skills in Python, Scala, or Java for data engineering.

- Familiarity with CI/CD, DevOps, containerization (Docker/K8s), Git, and Infrastructure-as-Code (Terraform/ARM/Bicep) is a plus.

Soft Skills :

- Strong communication, architectural documentation, and stakeholder management skills.

- Ability to lead design decisions, conduct architecture reviews, and guide engineering teams.

- Excellent analytical thinking and problem-solving capabilities.

Good to Have :

- Experience with Delta Lake, Iceberg, or Hudi lakehouse formats.

- Knowledge of MLflow, Feature Stores, or advanced analytics platforms.

- Exposure to Azure Databricks, Synapse, EMR, or other big data managed services.

- Experience with security frameworks, data cataloging tools, and lineage systems.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Sumita Haldar

NA at INNOVA SOLUTIONS PRIVATE LIMITED

Last Active: 16 Apr 2026

Job Views:
132

Applications: 71

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Technical / Solution Architect

Job Code

1610420

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers