Posted on: 24/03/2026
Description :
About the Role :
The Data Engineer will design, build, and operate scalable, reliable data pipelines and data products on a GCP-based Lakehouse architecture. This role focuses on enabling analytics, AI/ML, and data products by leveraging Google Cloud Platform services while adhering to open data standards to support long-term portability and interoperability.
You will work closely with platform engineers, architects, data product owners, and governance teams to deliver trusted, well-governed datasets using modern batch and streaming patterns on GCP.
Key Responsibilities :
1. Data Pipeline Development (Batch & Streaming) :
- Design and implement batch and streaming data pipelines on GCP using services such as Cloud Storage, BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud Composer.
- Build ingestion and transformation pipelines that support raw, curated, and consumption-ready datasets aligned to Lakehouse patterns.
- Optimize pipelines for performance, reliability, and cost efficiency on GCP.
2. Lakehouse & Open Data Standards :
- Implement data storage and processing using Lakehouse principles, including separation of storage and compute.
- Work with open table formats (e.g., Apache Iceberg or equivalent) and open file formats (e.g., Parquet) to enable interoperability across engines.
- Support schema evolution, time-travel, and transactional consistency where required by analytics and AI use cases.
3. Data Modeling & Data Products :
- Design analytical data models optimized for BI, reporting, and advanced analytics on GCP.
- Partner with data product owners to deliver reusable, well-documented data products.
- Ensure datasets are discoverable, understandable, and trusted by downstream consumers.
4. Data Quality, Governance & Lineage :
- Implement automated data quality checks and validation rules as part of data pipelines.
- Capture and publish metadata and lineage in alignment with enterprise standards and platform capabilities.
- Follow defined security and access-control patterns for sensitive and regulated data.
5. DataOps & Operational Excellence :
- Contribute to CI/CD pipelines for data workloads, including automated testing and deployment.
- Monitor and troubleshoot data pipelines to ensure operational stability.
- Participate in production support and incident resolution for data pipelines.
- Apply FinOps-aware practices to manage and optimize GCP data processing costs.
6. Collaboration & Continuous Improvement :
- Collaborate with platform engineers and architects to align pipeline implementations with platform standards.
- Contribute to shared frameworks, templates, and best practices for data engineering on GCP.
- Mentor junior engineers and support knowledge sharing within the data engineering community.
Required Qualifications :
- 5+ years of experience in data engineering or related roles.
- Strong hands-on experience with Google Cloud Platform, including services such as :
1. Cloud Storage
2. BigQuery
3. Dataflow / Dataproc
4. Pub/Sub
5. Cloud Composer
- Proficiency in Python and SQL for data processing and transformation.
- Experience building batch and streaming data pipelines in production environments.
- Solid understanding of Lakehouse architecture concepts and modern analytics data modeling.
Preferred Qualifications :
- Experience with open table formats (e.g., Apache Iceberg) and multi-engine query patterns. [GCP Activation Plan | PowerPoint]
- Familiarity with Dataplex, data catalogs, or metadata management tools on GCP.
- Exposure to AI/ML pipelines or integration with Vertex AI. [GCP Activation Plan | PowerPoint]
- Experience with infrastructure-as-code (e.g., Terraform) and DevOps/DataOps practices.
- Prior experience in cloud-agnostic or multi-cloud data platforms.
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1623086