Description :

As a Data Engineer, you will own the end-to-end lifecycle of our data infrastructure.

You will :

- design and implement robust, scalable data pipelines and architect modern data solutions.

- using a best-in-class technology stack.

Your work will transform raw, messy data into clean, reliable, and actionable data products that power decision-making across the business. Youll collaborate cross-functionally with product managers, data analysts, data scientists, and software engineers to understand data needs and deliver high-performance data solutions.

Your impact will be measured by how effectively data is delivered, modeled, and leveraged to drive business outcomes.

Key Responsibilities :

- Architect & Build : Design, implement and manage cloud-based data platform using a modern ELT (Extract, Load, Transform) approach.

- Data Ingestion : Develop and maintain robust data ingestion pipelines from a variety of sources, including operational databases (MongoDB, RDS), real-time IoT streams, and third-party APIs using services like AWS Kinesis/Lambda or Azure Event Hubs/Functions.

- Data Lake Management : Build and manage a scalable and cost-effective data lake on AWS S3 or Azure Data Lake Storage (ADLS Gen2), using open table formats like Apache Iceberg or Delta Lake.

- Data Transformation : Develop, test, and maintain complex data transformation models using dbt.

- Champion a software engineering mindset by applying principles of version control (Git), CI/CD, and automated testing to all data logic.

- Orchestration : Implement and manage data pipeline orchestration using modern tools like Dagster, Apache Airflow, or Azure Data Factory.

- Data Quality & Governance : Establish and enforce data quality standards. Implement automated testing and monitoring to ensure the reliability and integrity of all data assets.

- Performance & Cost Optimization : Continuously monitor and optimize the performance and cost of the data platform, ensuring our serverless query engines and storage layers are used efficiently.

- Collaboration : Work closely with data analysts and business stakeholders to understand their needs, model data effectively, and deliver datasets that power our BI tools (Metabase, Power BI).

Required Skills & Experience (Must-Haves) :

- 3+ years of professional experience in a data engineering role.

- Expert-level proficiency in SQL and the ability to write complex, highly-performant queries.

- Proficient in Python based data cleaning packages and tools.

- Experience in python is a must.

- Hands-on experience building data solutions on a major cloud provider (AWS or Azure), utilizing core services like AWS S3/Glue/Athena or Azure ADLS/Data Factory/Synapse.

- Proven experience building and maintaining data pipelines in Python.

- Experience with NoSQL databases like MongoDB, including an understanding of its data modeling, aggregation framework, and query patterns.

- Deep understanding of data warehousing concepts, including dimensional modeling, star/snowflake schemas, and data modeling best practices.

- Hands-on experience with modern data transformation tools, specifically dbt.

- Familiarity with data orchestration tools like Apache Airflow, Dagster, or Prefect.

- Proficiency with Git and experience working with CI/CD pipelines for data projects.

Preferred Skills & Experience (Nice-to-Haves) :

- Experience with real-time data streaming technologies, specifically AWS Kinesis or Azure Event Hubs.

- Experience with data cataloging and governance tools (i.e, OpenMetadata, DataHub, Microsoft Purview).

- Knowledge of infrastructure-as-code tools like Terraform or CloudFormation.

- Experience with containerization technologies (Docker, Kubernetes).