At Kochava, the Collective Data Solutions platform operates a multi-cloud, cloud-native data platform processing more than 50 billion events daily across global partner ecosystems.

The platform spans Google Cloud Platform (GCP) and AWS and leverages technologies such as BigQuery, Pub/Sub, Dataflow, Kubernetes (GKE), Cloud Composer (Airflow), Google Cloud Storage, and AWS S3.

It supports streaming ingestion, distributed processing, batch orchestration, data stewardship, and client delivery systems powering analytics and large-scale data workloads.

Platform Scale & Technology Stack :

Our data platform is built to operate at massive scale with the following architecture :

- Multi-cloud ingestion across AWS S3 and Google Cloud

- Event-driven microservices running on private GKE clusters

- High-throughput streaming pipelines using Pub/Sub Lite and Standard Pub/Sub

- Distributed data enrichment, validation, and transformation services

- BigQuery-centered data warehousing and analytics

- Batch orchestration using Cloud Composer (Airflow)

- Data lifecycle management across ingestion, staging, warehousing, retention, and delivery

- Fault tolerance using retry topics, dead-letter queues (DLQs), replay mechanisms, and idempotent processing

- Platform observability and SLA monitoring

- Secure workload identity and zero-trust platform access

The Role :

We are seeking a hands-on Senior Data Engineer with experience building and operating large-scale streaming data systems.

This role focuses on developing and operating distributed data infrastructure that ingests, enriches, transforms, warehouses, and delivers tens of billions of events daily.

The position sits at the intersection of data engineering, distributed systems, analytics infrastructure, and platform reliability.

What Makes This Role Unique :

- Work on a large-scale data platform processing 50B+ events daily across global partner ecosystems.
- Build and operate high-throughput streaming data pipelines using BigQuery, Pub/Sub, Dataflow, and Airflow on GCP and AWS.

Key Responsibilities :

- Design and operate high-throughput data pipelines processing 50B+ events per day

- Develop event-driven services for ingestion, enrichment, mapping, and delivery

- Contribute to the broader data platform architecture spanning streaming, staging, warehousing, and lifecycle management

- Perform deep data analysis using advanced SQL on very large datasets

- Optimize BigQuery workloads, including partitioning, clustering, and query performance

- Build analytics-ready and feature-ready datasets supporting reporting, experimentation, and AI/ML workflows

- Improve data quality, reliability, observability, and SLA tracking

- Collaborate with analytics, product, and data science teams

Core Skills & Technologies

- Python (strong) for pipeline development and data processing

- Advanced SQL (must-have) for analytical queries and large-scale dataset analysis

- Golang (preferred) for high-performance services

- Experience with BigQuery, Snowflake, or Redshift

- Streaming technologies such as Pub/Sub, Kafka, Apache Beam, Dataflow, or Spark

- Workflow orchestration using Airflow / Cloud Composer

- Infrastructure tools including Kubernetes (GKE), Docker, and Terraform

- Experience with Google Cloud Platform (preferred) and AWS

AI / ML Exposure (Preferred) :

- Experience building feature engineering pipelines or analytical datasets

- Supporting experimentation and machine learning workflows

- Familiarity with Python libraries such as pandas, NumPy, scikit-learn, XGBoost, or LightGBM

- Exposure to BI tools such as Looker, Tableau, or Power BI

- Experience with cloud ML platforms such as Vertex AI or SageMaker

- Familiarity with GenAI or LLM-assisted analytics

Experience at Scale :

- Experience operating data systems processing hundreds of millions to billions of records daily is strongly preferred

- Candidates are encouraged to highlight the scale of systems they have worked with (records/day, TB/day, events/sec)

Engineering Challenges :

Engineers in this role will solve problems such as :

- Designing high-throughput streaming pipelines

- Handling retries, duplicates, late data, and dead-letter workflows

- Building idempotent and fault-tolerant systems

- Optimizing BigQuery workloads over very large datasets

- Balancing cost, performance, and reliability in production pipelines

Ideal Background :

Candidates with experience in the following environments will be highly relevant :

- AdTech or MarTech platforms

- Real-time analytics or telemetry systems

- Payment or transaction processing platforms

- Large-scale ecommerce or marketplace data infrastructure

- Multi-cloud data processing environments

We are looking for engineers who enjoy solving large-scale data infrastructure challenges and operating production systems at scale.

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Recruiter

HR at KOCHAVA INDIA PRIVATE LIMITED

Last Active: NA as recruiter has posted this job through third party tool.

Job Views:
100

Applications: 55

Recruiter Actions: 0

Posted in

Data Engineering

Functional Area

Data Engineering

Job Code

1620510

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers