Posted on: 06/02/2026
Role : Senior Data Architect Open Source & Cloud Data Engineering
Location : Hyderabad/Chennai/Bangalore/Noida
Experience : 10- 16 years
Type : Full time
About the Role :
We are seeking a highly experienced Senior Data Architect with a proven track record of designing and implementing Medallion Architecture (Bronze Silver Gold) from the ground up using modern, open source and cloud-based data technologies.
This individual will be responsible for architecting large scale data ingestion, streaming, transformation, and storage solutions utilizing Azure Data Lake Storage Gen2, Apache Kafka/Event Hubs (Kafka protocol), Schema Registry, Debezium/Kafka Connect, Spark Structured Streaming, and industry-grade reliability patterns.
The ideal candidate combines deep hands-on expertise with strong architectural vision to build scalable, secure, and high-performance data ecosystems that serve analytics, AI/ML, and operational workloads.
Key Responsibilities :
1. Data Architecture & Platform Ownership :
- Architect and deliver Enterprise-grade Medallion Architecture (Bronze ? Silver ? Gold) from scratch, ensuring modularity, scalability, data quality pipelines, and lineage across layers.
- Define standards for data ingestion, curation, cleansing, enrichment, consumption, and metadata management.
- Develop logical, physical, and canonical data models for structured, semi structured, and streaming use cases.
- Establish robust patterns for ACID compliant transformations and incremental processing across layers.
2. Data Lake & Storage Architecture (ADLS Gen2) :
- Design and manage complex data lake zones with hierarchical namespaces, folder strategies, and partitioning standards.
- Implement data governance (RBAC/ACLs), encryption, lifecycle policies, and cost optimized storage tiers.
3. Streaming Architecture & Real time Data Pipelines :
- Architect and optimize Apache Kafka / Event Hubs (Kafka protocol) clusters, including topic strategy, retention, compaction, partitions, consumer/producer design, and offset management.
- Implement CDC pipelines using Debezium and Kafka Connect, enabling near real time ingestion from RDBMS and other source systems.
- Use Schema Registry (Avro/Protobuf/JSON Schema) to enforce schema evolution, compatibility, and governance.
- Define Dead Letter Queue (DLQ) patterns, retry handling, poison message strategies, and high reliability delivery.
- Implement exactly-once and idempotent processing patterns across producers, consumers, and downstream processors.
4. Big Data Processing & Distributed Compute (Apache Spark) :
- Lead design and optimization of Spark batch and streaming pipelines using Spark SQL, DataFrames, and Structured Streaming.
- Implement efficient transformations adhering to best practices like predicate pushdown, partition pruning, broadcast joins, caching, and minimizing shuffles.
- Tune cluster configuration, autoscaling, executor/driver memory, and shuffle behavior for cost-effective performance.
5. Integration, APIs & Enterprise Patterns :
- Integrate lakehouse architecture with upstream source systems, downstream consumption systems, BI platforms, and ML pipelines.
- Establish pipelines for metadata capture, data lineage, quality checks, SCD handling, and anomaly detection.
- Drive API-based ingestion/extraction models, including REST, gRPC, and streaming APIs.
6. Security, Governance & Observability :
- Define and implement robust data governance : data classification, cataloging, masking, and access policies.
- Implement observability across data pipelines : logging, metrics, tracing, alerts, SLA/SLO tracking, and incident response playbooks.
- Ensure compliance with enterprise security policies and regulatory standards.
7. Leadership, Collaboration & Strategy :
- Act as the primary data architecture authority, guiding engineering teams in implementation and best practices.
- Collaborate with business stakeholders, product owners, and engineering leaders to define data strategy, roadmaps, and architectural direction.
- Mentor data engineers on design patterns, code quality, optimization, and cloud-native data engineering.
Required Skills & Experience :
Core Architectural Expertise :
- Demonstrated experience designing Medallion Architecture (Bronze/Silver/Gold) from scratch in an enterprise environment.
- Strong background in distributed data architecture, lakehouse patterns, and event driven systems.
Cloud & Data Lake :
- Expert-level knowledge of ADLS Gen2, folder structuring, access control, partitioning strategies, and storage lifecycle management.
Streaming & Messaging :
- Deep experience with Apache Kafka, including :
- Topic modeling, partitioning, replication, retention, compaction
- Producer/consumer design, consumer groups, lag monitoring
- Schema Registry usage and schema evolution
- Idempotent writes and exactly once semantics
- DLQ and replay strategies
- Strong experience with Azure Event Hubs (Kafka protocol).
CDC & Middleware :
- Practical experience using Debezium and Kafka Connect for source connectors, sink connectors, CDC strategies, and operationalizing connectors.
Big Data Processing :
- Hands-on expertise with Apache Spark : DataFrames, Spark SQL, Structured Streaming, Delta style incremental pipelines.
- Advanced Spark performance tuning and troubleshooting skills.
Programming & Tools :
- Strong skills in Python, Scala, or Java for data engineering.
- Familiarity with CI/CD, DevOps, containerization (Docker/K8s), Git, and Infrastructure-as-Code (Terraform/ARM/Bicep) is a plus.
Soft Skills :
- Strong communication, architectural documentation, and stakeholder management skills.
- Ability to lead design decisions, conduct architecture reviews, and guide engineering teams.
- Excellent analytical thinking and problem-solving capabilities.
Good to Have :
- Experience with Delta Lake, Iceberg, or Hudi lakehouse formats.
- Knowledge of MLflow, Feature Stores, or advanced analytics platforms.
- Exposure to Azure Databricks, Synapse, EMR, or other big data managed services.
- Experience with security frameworks, data cataloging tools, and lineage systems.
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Technical / Solution Architect
Job Code
1610420