Description :

Job Title : Senior Data Engineer (Druid & Real-Time Systems)

Experience : 7+ Years

Type : Full-Time

Role Overview :

We are seeking a seasoned Senior Data Engineer to lead the architecture, deployment, and optimization of our high-performance analytics platform.

The core of this role involves managing massive-scale Apache Druid clusters to deliver sub-second OLAP queries.

You will be the bridge between raw data streams and actionable insights, building robust pipelines that integrate modern data lakehouses with real-time streaming technologies.

Key Responsibilities :

- Druid Architecture & Optimization : Design, deploy, and manage highly available Apache Druid clusters. Perform deep-dive performance tuning (compaction, indexing, caching) to ensure sub-second query latency on petabyte-scale datasets.

- Pipeline Engineering : Architect and maintain end-to-end data ingestion pipelines. This includes real-time streaming via Kafka and Spark Structured Streaming, as well as complex batch processing using Airflow.

- Cloud Infrastructure : Manage Druid deployments on AWS (EKS). Handle scaling, resource allocation, and cost optimization across EC2, S3, and Kubernetes environments.

- Ecosystem Integration : Build seamless data bridges between Druid and our broader data ecosystem, including Snowflake, Databricks (Delta Lake), and various Data Lakes.

- Observability & Security : Implement rigorous monitoring, alerting, and logging for distributed systems. Ensure data governance and security through IAM roles, encryption, and VPC configurations.

- Technical Leadership : Act as a subject matter expert on OLAP and distributed systems, guiding the team on best practices for columnar storage and high-concurrency query patterns.

Technical Requirements :

Must-Have Skills :

- Core Engine : Expert-level hands-on experience with Apache Druid (Deep storage, MiddleManagers, Brokers, and Historicals).

- Streaming & Compute : Mastery of Kafka and Spark Structured Streaming for low-latency data processing.

- Cloud & Orchestration : Strong proficiency in AWS (EC2, S3, IAM) and Kubernetes (EKS) for containerized deployments.

- Storage & OLAP : Deep understanding of distributed systems, columnar storage formats, and OLAP cubes.

- Programming : Fluent in Python, Scala, or Java.

- Modern Data Stack : Hands-on experience with Snowflake and Databricks (PySpark, Delta Lake).

Preferred Qualifications :

- Experience with BI tool integration (e.g., Looker, Superset, or Tableau) specifically optimized for Druid.

- Contributions to open-source projects (Druid, Spark, or Kafka).

- Knowledge of Infrastructure as Code (Terraform or CloudFormation).