Description :
Responsibilities :
- Building PB Scale Data Pipelines: Building highly performant, large-scale data infrastructure that can scale to 100K+ jobs and handle PB-scale data per day.
- Cloud-Native Data Infrastructure: Design and implement robust, scalable data infrastructure on AWS, utilizing Kubernetes and Airflow for efficient resource management and deployment.
- Intelligent SQL Ecosystem: Design and develop a comprehensive SQL intelligence system encompassing query optimization, dynamic pipeline generation, and data lineage tracking.
- Leverage your expertise in SQL query profiling, AST analysis, and parsing to create a sophisticated engine focused on query performance improvements, building adaptive data pipelines, and implementing granular column-level lineage.
Requirements :
- 8+ years of experience in data engineering, with a focus on building scalable data pipelines and systems.
- Strong proficiency in Python and SQL.
- Extensive experience with SQL query profiling, optimization, and performance tuning, preferably with Snowflake.
- Deep understanding of SQL Abstract Syntax Tree (AST) and experience working with SQL parsers (e. g., sqlglot) for generating column-level lineage and dynamic ETLs.
- Experience in building data pipelines using Airflow or dbt.
- [Optional] Solid understanding of cloud platforms, particularly AWS.
- [Optional] Familiarity with Kubernetes (K8s) for containerized deployments.