Job Title : Data Architect

Position Summary :

We are seeking a high-impact Data Architect to own the end-to-end design, execution, and strategic evolution of our multi-cloud data ecosystem.

This is a leadership role requiring deep technical polyglot-ism across data engineering, cloud architecture, and software development, combined with the strategic vision and people management skills to lead a high-performing data engineering team.

You will be the primary technical authority for all data-at-rest and data-in-motion, responsible for designing scalable, resilient, and high-concurrency data models, storage solutions, and processing pipelines.

The ideal candidate is a hands-on-keyboard architect who can write production-level Python code, optimize complex SQL, deploy infrastructure via Terraform, and mentor junior engineers, all while defining the long-term data roadmap to support our business-critical analytics, data science, and ML initiatives.

Core Technical Responsibilities :

1. Data Architecture & Strategy :

- Design & Blueprinting : Architect and document the canonical enterprise data model, data flow diagrams (DFDs), and architectural blueprints for our data platform.

- Technology & Tool Selection : Lead the evaluation, PoC (Proof of Concept), and selection of all data platform technologies, balancing build-vs-buy decisions for ingestion, storage, processing, and governance.

- Multi-Cloud Strategy : Design and implement a cohesive, abstracted data architecture that federates data and workloads across AWS, Azure, and GCP.

Implement patterns for inter-cloud data movement, cost optimization, and security parity.

- Modern Paradigms : Champion and implement modern data architecture patterns, including Data Mesh, Data Fabric, and Lakehouse (e.g., Databricks/Delta Lake), moving beyond traditional monolithic warehousing.

2. Data Engineering & Pipeline Orchestration :

- ETL/ELT Frameworks : Engineer and optimize high-throughput, fault-tolerant data ingestion and transformation pipelines.

Must be an expert in both batch and near-real-time streaming (e.g., Kafka, Kinesis, Pub/Sub) architectures.

- Modern ELT Stack : Demonstrate mastery of the modern data stack, including data transformation (e.g., dbt), ingestion (e.g., Fivetran, Airbyte), and orchestration (e.g., Airflow, Dagster, Prefect).

- SQL & Database Design : Possess expert-level SQL skills, including query optimization, analytical functions, CTEs, and procedural SQL.

- Design and implement DDL for data warehouses (e.g., Snowflake, BigQuery, Redshift) and OLTP systems, ensuring normalization/denormalization is optimized for use case.

3. Programming & Infrastructure :

- Python Expertise : Utilize Python as a first-class language for data engineering.

- This includes writing custom ETL scripts, building data-centric microservices/APIs (e.g., using FastAPI), leveraging PySpark for distributed processing, and scripting for automation.

- Infrastructure as Code (IaC) : Own the data platform's infrastructure definitions using Terraform or CloudFormation.

Implement and enforce CI/CD best practices (e.g., GitHub Actions, Jenkins) for all data pipeline and infrastructure code.

- Containerization : Leverage Docker and Kubernetes (EKS, GKE, AKS) for deploying and scaling data services and applications.

4. Leadership & People Management :

- Team Leadership : Lead and mentor a team of data engineers, data modelers, and BI developers.

Manage team velocity, sprint planning (Agile/Scrum), and performance reviews.

- Code Quality & Best Practices : Enforce software engineering best practices within the data team, including rigorous code reviews, version control (Git), unit/integration testing, and comprehensive documentation.

- Stakeholder Management : Act as the primary technical liaison to cross-functional leaders (Product, Engineering, Data Science).

- Translate complex business requirements into technical specifications and data models.

Required Qualifications & Technical Stack :

- Experience : 10+ years in data engineering/architecture, with at least 3+ years in a formal leadership or people management role.

- Python : Demonstrable, expert-level proficiency in Python for data manipulation (Pandas, Polars), distributed computing (PySpark, Dask), and API development.

- SQL : Mastery of advanced SQL, DDL, DML, and query performance tuning on one or more major analytical databases (Snowflake, BigQuery, Redshift, Databricks SQL).

- Cloud : 5+ years of hands-on experience designing and building data solutions on at least two of the major cloud providers (AWS, GCP, Azure).

- Must understand the native services (e.g., S3/ADLS/GCS, Redshift/BigQuery/Synapse, Glue/Data Factory, Kinesis/Event Hubs).

- ETL/ELT Tools : Deep experience with modern data stack tooling.

Must have hands-on experience with :

- Orchestration : Airflow, Dagster, or Prefect.

- Transformation : dbt (highly preferred).

- Data Modeling : Expert in dimensional modeling (Kimball) and 3NF, with proven experience designing data models for large-scale data warehouses and data marts.

- Leadership : Proven ability to build, manage, and motivate a technical team.

Must be able to articulate a strategic technical vision and execute it.

Preferred Qualifications :

- Certifications : Professional-level cloud architect certifications (e.g., AWS Certified Solutions Architect Professional, Google Cloud Professional Data Engineer).

- Streaming : Hands-on experience with Apache Kafka, Spark Structured Streaming, or Flink.

- Data Governance : Experience implementing data governance and cataloging tools (e.g., Collibra, Alation, Amundsen).

- MLOps : Familiarity with MLOps pipelines and infrastructure to support data science model training and deployment