Role Overview :

As a Lead Data Engineer, you will define and drive the enterprise data engineering strategy for a next-generation unified analytics foundation spanning Digital, Stores, and Marketplace channels.

This role owns the end-to-end data architecture roadmap, including the complete migration from Snowflake to a Databricks/Spark Lakehouse ecosystem on AWS, while ensuring strong KPI alignment and enterprise-wide metric consistency.

You will operate as both a hands-on technical leader and a strategic architect, influencing platform design decisions, governance frameworks, and modernization programs at scale.

Key Responsibilities :

Architecture & Technical Leadership :

- Define target-state data architecture using Databricks, Apache Spark, and AWS-native services

- Lead Snowflake migration strategy to Databricks/Spark Lakehouse

- Design scalable, secure, and cost-efficient batch and streaming pipelines

- Establish architectural standards for modeling, storage, and performance optimization

Data Engineering & Platform Strategy :

- Develop ETL/ELT pipelines using Python, Spark, and Advanced SQL

- Build robust data pipelines using AWS S3, Lambda, EMR, and Databricks

- Enable real-time processing via Kafka, Kinesis, or Spark Streaming

- Implement containerized deployments using Docker and Kubernetes

Orchestration, CI/CD & Infrastructure :

- Lead orchestration standards using Apache Airflow

- Implement CI/CD pipelines using Git and Jenkins

- Manage Infrastructure as Code using Terraform or CloudFormation

Governance & Metrics :

- Establish enterprise-wide data lineage and cataloging frameworks

- Define KPI frameworks and ensure metric consistency across domains

- Partner with analytics and business teams to deliver trusted insights

Observability & Leadership :

- Implement monitoring and operational excellence standards

- Define SLAs/SLOs for mission-critical analytics workloads

- Mentor and guide engineering teams

Must-Have Technical Stack :

Core : Databricks, Apache Spark, Python, Advanced SQL

Cloud : AWS (S3, Lambda, EMR)

Orchestration & DevOps : Apache Airflow, Jenkins (CI/CD), Docker, Terraform

Streaming : Kafka, Kinesis, or Spark Streaming

- 6 - 8+ years of experience in data engineering and distributed systems

- Strong AWS production experience

- Advanced Python and SQL expertise

- Proven experience modernizing legacy analytics platforms to Databricks/Spark Lakehouse

- Strong data governance and enterprise metric management exposure

Certifications (Mandatory) :

- Databricks Certified Data Engineer - Professional

- AWS Solutions Architect Associate or Professional (Preferred)

Work Model :

- Hybrid : 3 Days Work from Office, 2 Days Work from Home

- Day Shift with overlap with US team

- Expected working window : 10 : 30/11 : 00 AM IST to 10 : 00/11 : 00 PM IST (with adequate breaks)