- Building PB Scale Data Pipelines : Building highly performant, large-scale data infrastructure that can scale to 100K+ jobs and handle PB-scale data per day.

- Cloud-Native Data Infrastructure : Design and implement a robust, scalable data infrastructure on AWS, utilising Kubernetes and Airflow for efficient resource management and deployment.

- Intelligent SQL Ecosystem : Design and develop a comprehensive SQL intelligence system encompassing query optimisation, dynamic pipeline generation, and data lineage tracking. Leverage your expertise in SQL query profiling, AST analysis, and parsing to create a sophisticated engine focused on query performance improvements, building adaptive data pipelines, and implementing granular column-level lineage.

Impact :

- Innovation at the Forefront : Push the boundaries of data engineering by combining traditional techniques with cutting-edge AI technologies.

- High Visibility : Directly affect the productivity and capabilities of global data teams, as your contributions will be crucial to the daily operations of thousands of users spread across 100s of countries.

- Open Source Contribution : As part of our commitment to the developer community, you will contribute to our open-source initiatives, gaining recognition in the tech community.

- Career Growth : This role is a launchpad into the rapidly advancing field of AI-powered data engineering, offering exposure to state-of-the-art technologies and generative AI applications.

Requirements :

- 8+ years of experience in data engineering, with a focus on building scalable data pipelines and systems.

- Strong proficiency in Python and SQL.

- Extensive experience with SQL query profiling, optimisation, and performance tuning, preferably with Snowflake.

- Deep understanding of SQL Abstract Syntax Tree (AST) and experience working with SQL parsers (e. g., sqlglot) for generating column-level lineage and dynamic ETLs.

- Experience in building data pipelines using Airflow or dbt. 7

- [Optional] Solid understanding of cloud platforms, particularly AWS.

- [Optional] Familiarity with Kubernetes (K8S) for containerised deployments.