Proficiency in building highly scalable ETL and streaming-based data pipelines using Google Cloud Platform (GCP) services and products like Biquark, Cloud Dataflow
Proficiency in large scale data platforms and data processing systems such as Google Big Query, Amazon Redshift, Azure Data Lake
Excellent Python, PySpark and SQL development and debugging skills, exposure to other Big Data frameworks like Hadoop Hive would be added advantage
Experience building systems to retrieve and aggregate data from event-driven messaging frameworks (e.g. RabbitMQ and Pub/Sub)