Posted on: 18/07/2025
Job Summary :
You will play a critical role in building and managing large-scale data infrastructure and processing pipelines on the Google Cloud Platform.
Your work will directly contribute to business intelligence, analytics, and machine learning initiatives by ensuring that reliable, high-quality data is available and accessible across the organization.
Key Responsibilities :
- Implement distributed data processing using Apache Spark (PySpark) on DataProc and stream/batch processing using Apache Beam in DataFlow.
- Perform ETL (Extract, Transform, Load) operations to prepare data for downstream analytics and reporting.
- Work closely with data scientists, analysts, and business users to define data requirements and deliver actionable insights through BigQuery.
- Build and maintain data models, data marts, and automated pipelines for structured and unstructured data.
- Monitor and troubleshoot performance issues in real time; proactively tune pipelines and workflows for cost and speed efficiency.
- Implement data governance standards around data quality, privacy, security, and compliance (GDPR, HIPAA, etc.
- Write and maintain technical documentation for data flows, architecture, ETL jobs, and data transformations.
- Participate in code reviews and contribute to a culture of continuous improvement and high code quality.
- Collaborate in agile development environments using tools such as Git, JIRA, and CI/CD frameworks.
Must-Have Skills & Qualifications :
1. DataProc (Apache Spark/Hadoop)
2. DataFlow (Apache Beam)
3. BigQuery (SQL-based analytics and warehousing)
4. Cloud Storage, Pub/Sub, IAM, and Monitoring tools.
- Proficient in Python and PySpark, with experience building modular, reusable ETL pipelines.
- Experience with real-time and batch data processing.
- Deep understanding of distributed computing, data partitioning, and pipeline orchestration.
- Solid foundation in SQL and performance optimization in BigQuery.
- Strong understanding of data security, data privacy, and industry best practices in cloud environments
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1514595
Interview Questions for you
View All