Description :

About the Role

We are looking for a highly skilled Lead Data Engineer who can take full ownership of designing, building, and scaling modern data pipelines across batch and real-time systems.

You will play a central role in architecting our data platform, ensuring high reliability, quality, and performance across a diverse set of data sources and downstream consumers.

This is a hands-on leadership role suited for someone who thrives in fast-paced environments and enjoys solving complex data engineering challenges end-to-end.

Key Responsibilities :

1. Data Pipeline Architecture & Development

- Design, build, and maintain end-to-end batch and real-time data pipelines using modern data engineering tools and best practices.

- Implement and optimize ingestion frameworks supporting CDC pipelines, streaming data (Kafka/Redpanda), and scheduled batch jobs.

- Build robust, scalable ETL/ELT processes that integrate data from Postgres, MySQL, MongoDB, ClickHouse, Timescale, and external APIs.

2. Data Platform & Storage

- Architect reliable data lake and warehouse solutions using S3/GCS, Parquet, partitioning, and metadata-driven systems.

- Implement schema evolution, deduplication, incremental ingestion, and automated backfills.

- Design cost-effective data storage and query strategies to support large-scale event volumes (millions+ per day).

3. Orchestration, Monitoring & Quality

- Develop workflow orchestration using Airflow, Dagster, or Prefect.

- Build monitoring and alerting for pipeline health using logs, metrics, and dashboards.

- Implement data quality checks, validation frameworks, SLAs, and reconciliation processes to ensure trust in the data.

4. Performance & Optimisation

- Write high-performance SQL (window functions, complex aggregations, indexing, query tuning).

- Optimize pipelines for cost, speed, scalability, and reliability across distributed systems.

- Continuously improve data models and internal tooling to support analytics and downstream applications.

5. Collaboration & Leadership

- Work closely with product, analytics, and engineering teams to gather requirements and translate them into well-designed data solutions.

- Take ownership of projects from concept to deployment with minimal supervision.

- Mentor team members and contribute to building best practices, coding standards, and documentation.

Required Skills & Experience:

- 6+ years in Data Engineering, with at least 2+ years in a senior/lead role.

- Strong expertise in building production-grade data pipelines end-to-end.

- Proven experience with both batch and real-time systems.

- Hands-on experience with Kafka/Redpanda, CDC solutions, and streaming frameworks.

- Deep understanding of data lakes, warehouses, and modern storage formats (Parquet).

- Strong SQL expertise, including complex window functions and performance optimization.

- Experience working with multiple databases: Postgres, MySQL, MongoDB, ClickHouse, Timescale.

- Proficiency with orchestration tools (Airflow/Dagster/Prefect).

- Capable of delivering scalable data architectures in fast-paced, high-growth environments.

- Experience with monitoring, alerting, and ensuring data reliability at scale.

Nice-to-Have:

- Logistics/transportation domain experience.

- Knowledge of geospatial data processing.

- Familiarity with DBT, Lakehouse technologies (Iceberg/Delta/Hudi), and Kubernetes.

- Experience with cost optimization on cloud platforms.