Description :

This role is central to building the core data and intelligence infrastructure that powers AI-driven engineering insights for organizations worldwide. You will architect and scale systems that ingest, process, and operationalize data from Git, Jira, CI/CD systems, and other developer tools forming the backbone of our engineering analytics platform.

Key Responsibilities :

- Architect and scale multi-source data ingestion pipelines

- Build robust ingestion flows from Git, Jira, CI/CD tools, and external developer systems using APIs, webhooks, and incremental sync mechanisms.

- Strengthen and modularize Java-based ETL pipelines

- Refactor, optimize, and extend existing pipelines for higher reusability, maintainability, and long-term scalability

- Implement high-throughput data processing architectures

- Design parallel, batch, and event-driven data flows using technologies such as Kafka, SQS, and streaming frameworks.

- Optimize large-scale Postgres environments

- Drive schema design, indexing strategies, partitioning, and query tuning to support large datasets (100GB+) across multi-tenant workloads

- Establish strong data orchestration and observability practices

- Lead the adoption of Airflow, Temporal, OpenTelemetry, or similar platforms for workflow orchestration, lineage tracking, and system observability.

- Collaborate cross-functionally with backend, product, and AI teams

- Ensure data is modeled, enriched, and exposed in formats that enable downstream insights, dashboards, and machine-learning pipelines

- Ensure efficient, scalable cloud operations on AWS

- Build and maintain cost-effective, resilient infrastructure using S3, ECS, Lambda, RDS, and CloudWatch to support demanding data workloads.

- Develop self-healing, fully monitored data pipelines

- Implement fail-safe mechanisms, automated recovery, and monitoring systems that minimize operational overhead and ensure high reliability.

What You Bring :

- 6- 10 years of experience in backend or data engineering

- Preferably in data-intensive or analytics-driven product environments

- Strong expertise in Java and AWS

- Hands-on experience with S3, ECS, RDS, Lambda, CloudWatch, and distributed systems.

- Extensive experience integrating external APIs

- Proven ability to fetch, sync, and transform data from systems like GitHub, Jira, Jenkins, and Bitbucket

- Deep understanding of data modeling principles

- Including incremental updates, schema evolution, and data lifecycle management.

- Advanced Postgres performance tuning skills

- Experience with indexing, partitioning, and optimizing queries on large, high-volume datasets.

- Experience building and scaling data pipelines

- Exposure to analytics systems handling 100M+ records or multi-tenant architectures.

Bonus capabilities :

- Knowledge of dbt, Kafka, ClickHouse, Temporal, or developer-analytics ecosystems.