HamburgerMenu
hirist

Data Engineer - Google Cloud Platform

infinitrix Consulting
Mumbai
4 - 6 Years

Posted on: 18/07/2025

Job Description

Job Summary :


We are seeking a highly skilled and motivated Data Engineer GCP to join our growing data and cloud engineering team.

You will play a critical role in building and managing large-scale data infrastructure and processing pipelines on the Google Cloud Platform.

Your work will directly contribute to business intelligence, analytics, and machine learning initiatives by ensuring that reliable, high-quality data is available and accessible across the organization.


Key Responsibilities :


- Design, build, and maintain scalable, fault-tolerant data pipelines on Google Cloud Platform using tools like DataProc, DataFlow, and Cloud Composer.

- Implement distributed data processing using Apache Spark (PySpark) on DataProc and stream/batch processing using Apache Beam in DataFlow.

- Perform ETL (Extract, Transform, Load) operations to prepare data for downstream analytics and reporting.

- Work closely with data scientists, analysts, and business users to define data requirements and deliver actionable insights through BigQuery.

- Build and maintain data models, data marts, and automated pipelines for structured and unstructured data.

- Monitor and troubleshoot performance issues in real time; proactively tune pipelines and workflows for cost and speed efficiency.

- Implement data governance standards around data quality, privacy, security, and compliance (GDPR, HIPAA, etc.

- Write and maintain technical documentation for data flows, architecture, ETL jobs, and data transformations.

- Participate in code reviews and contribute to a culture of continuous improvement and high code quality.

- Collaborate in agile development environments using tools such as Git, JIRA, and CI/CD frameworks.


Must-Have Skills & Qualifications :


- Strong hands-on experience with Google Cloud Platform (GCP) services, especially :


1. DataProc (Apache Spark/Hadoop)

2. DataFlow (Apache Beam)

3. BigQuery (SQL-based analytics and warehousing)

4. Cloud Storage, Pub/Sub, IAM, and Monitoring tools.

- Proficient in Python and PySpark, with experience building modular, reusable ETL pipelines.

- Experience with real-time and batch data processing.

- Deep understanding of distributed computing, data partitioning, and pipeline orchestration.

- Solid foundation in SQL and performance optimization in BigQuery.

- Strong understanding of data security, data privacy, and industry best practices in cloud environments


info-icon

Did you find something suspicious?