HamburgerMenu
hirist

Sprouts.ai - Senior Data Architect - ETL/PySpark

Sprouts.ai
Multiple Locations
10 - 16 Years

Posted on: 19/08/2025

Job Description

Job Title : Senior Data Architect

Location : Bangalore/Chandigarh

Job Type : Full-time

Experience : 10+ years

Job Summary :





We are looking for an experienced Data Architect to lead the design, development, and optimization of our modern data infrastructure. The ideal candidate will have deep expertise in big data platforms, data lakes, lakehouse architectures, and hands-on experience with modern tools such as Spark clusters, PySpark, Apache Iceberg, the Nessie catalog, and Apache Airflow.

This role will be pivotal in evolving our data platform, including database migrations, scalable data pipelines, and governance-ready architectures for both analytical and operational use cases.

Key Responsibilities :

- Design and implement scalable and reliable data architectures for real-time and batch processing systems

- Evaluate and recommend data tools, frameworks, and infrastructure aligned with company goals

- Develop and optimize complex ETL/ELT pipelines using PySpark and Apache Airflow

- Architect and manage data lakes using Spark on Apache Iceberg and Nessie catalog for

versioned and governed data workflows

- Perform data analysis, data profiling, data quality improvements, and data modeling

- Lead database migration efforts, including planning, execution, and optimization

- Define and enforce data engineering best practices, data governance standards, and schema

evolution strategies

- Collaborate cross-functionally with data scientists, analysts, platform engineers, and business

stakeholders

Required Skills & Qualifications :

- 10+ years of experience in data architecture, data engineering, data security, data

governance, and big data platforms

- Deep understanding of trade-offs between managed services and open-source data stack

tools, including cost, scalability, operational overhead, flexibility, and vendor lock-in

- Strong hands-on experience with PySpark for writing data pipelines and distributed data processing

- Proven expertise with Apache Iceberg, Apache Hudi, and the Nessie catalog for modern table formats and versioned data catalogs

- Experience in scaling and managing Elasticsearch and PostgreSQL clusters

- Strong experience with Apache Airflow for workflow orchestration (or equivalent tools)

- Demonstrated success in database migration projects across multiple cloud providers

- Ability to perform deep data analysis and compare datasets between systems

- Experience handling 100s of terabytes of data or more

- Proficiency in SQL, data modeling, and performance tuning

- Excellent communication and presentation skills, with the ability to lead technical

conversations

Nice to Have :

- Experience in Sales, Marketing, and CRM domains, especially with Accounts and Contacts data

- Knowledge in AI and vector databases.

- Exposure to streaming data frameworks (Kafka, Flink, etc.)

- Ability to support analytics and reporting initiatives

Why Join Us :

- Work on cutting-edge data architectures using modern open-source technologies

- Be part of a team transforming data operations and analytics at scale

- Opportunity to architect high-impact systems from the ground up

- Join a collaborative, innovation-driven culture

info-icon

Did you find something suspicious?