HamburgerMenu
hirist

Innova Solutions - Senior Data Architect - Open Source & Cloud Data Engineering

INNOVA SOLUTIONS PRIVATE LIMITED
10 - 16 Years
Multiple Locations

Posted on: 06/02/2026

Job Description

Role : Senior Data Architect Open Source & Cloud Data Engineering


Location : Hyderabad/Chennai/Bangalore/Noida


Experience : 10- 16 years


Type : Full time


About the Role :


We are seeking a highly experienced Senior Data Architect with a proven track record of designing and implementing Medallion Architecture (Bronze Silver Gold) from the ground up using modern, open source and cloud-based data technologies.


This individual will be responsible for architecting large scale data ingestion, streaming, transformation, and storage solutions utilizing Azure Data Lake Storage Gen2, Apache Kafka/Event Hubs (Kafka protocol), Schema Registry, Debezium/Kafka Connect, Spark Structured Streaming, and industry-grade reliability patterns.


The ideal candidate combines deep hands-on expertise with strong architectural vision to build scalable, secure, and high-performance data ecosystems that serve analytics, AI/ML, and operational workloads.


Key Responsibilities :


1. Data Architecture & Platform Ownership :


- Architect and deliver Enterprise-grade Medallion Architecture (Bronze ? Silver ? Gold) from scratch, ensuring modularity, scalability, data quality pipelines, and lineage across layers.


- Define standards for data ingestion, curation, cleansing, enrichment, consumption, and metadata management.


- Develop logical, physical, and canonical data models for structured, semi structured, and streaming use cases.


- Establish robust patterns for ACID compliant transformations and incremental processing across layers.


2. Data Lake & Storage Architecture (ADLS Gen2) :


- Design and manage complex data lake zones with hierarchical namespaces, folder strategies, and partitioning standards.


- Implement data governance (RBAC/ACLs), encryption, lifecycle policies, and cost optimized storage tiers.


3. Streaming Architecture & Real time Data Pipelines :


- Architect and optimize Apache Kafka / Event Hubs (Kafka protocol) clusters, including topic strategy, retention, compaction, partitions, consumer/producer design, and offset management.


- Implement CDC pipelines using Debezium and Kafka Connect, enabling near real time ingestion from RDBMS and other source systems.


- Use Schema Registry (Avro/Protobuf/JSON Schema) to enforce schema evolution, compatibility, and governance.


- Define Dead Letter Queue (DLQ) patterns, retry handling, poison message strategies, and high reliability delivery.


- Implement exactly-once and idempotent processing patterns across producers, consumers, and downstream processors.


4. Big Data Processing & Distributed Compute (Apache Spark) :


- Lead design and optimization of Spark batch and streaming pipelines using Spark SQL, DataFrames, and Structured Streaming.


- Implement efficient transformations adhering to best practices like predicate pushdown, partition pruning, broadcast joins, caching, and minimizing shuffles.


- Tune cluster configuration, autoscaling, executor/driver memory, and shuffle behavior for cost-effective performance.


5. Integration, APIs & Enterprise Patterns :


- Integrate lakehouse architecture with upstream source systems, downstream consumption systems, BI platforms, and ML pipelines.


- Establish pipelines for metadata capture, data lineage, quality checks, SCD handling, and anomaly detection.


- Drive API-based ingestion/extraction models, including REST, gRPC, and streaming APIs.


6. Security, Governance & Observability :


- Define and implement robust data governance : data classification, cataloging, masking, and access policies.


- Implement observability across data pipelines : logging, metrics, tracing, alerts, SLA/SLO tracking, and incident response playbooks.


- Ensure compliance with enterprise security policies and regulatory standards.


7. Leadership, Collaboration & Strategy :


- Act as the primary data architecture authority, guiding engineering teams in implementation and best practices.


- Collaborate with business stakeholders, product owners, and engineering leaders to define data strategy, roadmaps, and architectural direction.


- Mentor data engineers on design patterns, code quality, optimization, and cloud-native data engineering.


Required Skills & Experience :


Core Architectural Expertise :


- Demonstrated experience designing Medallion Architecture (Bronze/Silver/Gold) from scratch in an enterprise environment.


- Strong background in distributed data architecture, lakehouse patterns, and event driven systems.


Cloud & Data Lake :


- Expert-level knowledge of ADLS Gen2, folder structuring, access control, partitioning strategies, and storage lifecycle management.


Streaming & Messaging :


- Deep experience with Apache Kafka, including :


- Topic modeling, partitioning, replication, retention, compaction


- Producer/consumer design, consumer groups, lag monitoring


- Schema Registry usage and schema evolution


- Idempotent writes and exactly once semantics


- DLQ and replay strategies


- Strong experience with Azure Event Hubs (Kafka protocol).


CDC & Middleware :


- Practical experience using Debezium and Kafka Connect for source connectors, sink connectors, CDC strategies, and operationalizing connectors.


Big Data Processing :


- Hands-on expertise with Apache Spark : DataFrames, Spark SQL, Structured Streaming, Delta style incremental pipelines.


- Advanced Spark performance tuning and troubleshooting skills.


Programming & Tools :


- Strong skills in Python, Scala, or Java for data engineering.


- Familiarity with CI/CD, DevOps, containerization (Docker/K8s), Git, and Infrastructure-as-Code (Terraform/ARM/Bicep) is a plus.


Soft Skills :


- Strong communication, architectural documentation, and stakeholder management skills.


- Ability to lead design decisions, conduct architecture reviews, and guide engineering teams.


- Excellent analytical thinking and problem-solving capabilities.


Good to Have :


- Experience with Delta Lake, Iceberg, or Hudi lakehouse formats.


- Knowledge of MLflow, Feature Stores, or advanced analytics platforms.


- Exposure to Azure Databricks, Synapse, EMR, or other big data managed services.


- Experience with security frameworks, data cataloging tools, and lineage systems.

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in