Description :

Job Title : Data Engineering Lead Analytics & Observability Platforms.

About iMerit :

iMerit is a leading AI data solutions company that transforms unstructured data into structured intelligence for advanced machine learning and analytics.

Our customers span autonomous mobility, medical AI, agriculture, and more - delivering high-quality data services that power next-generation AI systems.

About the Role :

We are looking for a seasoned Engineering Lead to architect, scale, and continuously evolve our analytics and observability platform - a system deeply integrated with annotation tools and ML pipelines.

This platform powers real-time visibility, operational insights, and automation across large-scale data operations.

In this role, you will not only lead and mentor a team but also set the technical vision for high-throughput streaming systems and modern data lake/warehouse architectures.

You will : bring proven expertise in high velocity, high volume data engineering, driving innovation in how we process, curate, and surface data to support mission-critical AI workflows.

Key Responsibilities :

Lead & Inspire : Build and mentor a high-performing data engineering team, fostering innovation, accountability, and technical excellence.

Architect at Scale :

- Design and implement high-volume batch and real-time data pipelines across structured and unstructured sources.

- Build and maintain real-time data lakes with streaming ingestion, ensuring data quality, lineage, and availability.

- Curate, transform, and optimize datasets into high-performance data warehouses (e.g., Redshift, Snowflake) for downstream analytics.

- Deep Streaming Expertise : Drive adoption and optimization of Kafka for messaging, event streaming, and system integration, ensuring high throughput and low latency.

- Advanced Processing : Leverage PySpark for distributed data processing and complex transformations, delivering scalable ETL/ELT pipelines.

- Orchestration & Automation : Utilize AWS Glue and related cloud services to orchestrate data workflows, automate schema management, and scale pipelines seamlessly.

- Continuous Improvement : Oversee platform upgrades, schema evolution, and performance tuning, ensuring the platform meets growing data and user demands.

- Observability & Insights : Implement metrics, dashboards, and alerting for key KPIs (annotation throughput, quality, latency), ensuring operational excellence.

- Cross-Functional Collaboration : Work closely with product, platform, and customer teams to define event models, data contracts, and integration strategies.

- Innovation and R&D : Research emerging technologies in data streaming, lakehouse architectures, and observability, bringing forward new approaches and prototypes.

Minimum Qualifications :

- 10+ years of experience in data engineering or backend engineering, with at least 23 years in a leadership or team-lead role.

- Proven track record in building and operating data pipelines at scale - including both batch ETL/ELT and real-time streaming.

- Expert-level experience with Kafka for high-throughput data ingestion, streaming transformations, and integrations.

- Strong hands-on experience with PySpark for distributed data processing and advanced transformations.

- In-depth knowledge of AWS Glue(or similar)for orchestrating workflows, managing metadata, and automating ETL pipelines.

- Demonstrated success in upgrading and maintaining real-time data lakes, curating and transforming datasets into performant data warehouses.

- Familiarity with lakehouse and warehouse patterns(e.g., Delta Lake, Redshift, Snowflake) and schema versioning.

- Experience with cloud-native data services (S3, Kinesis, Lambda,RDS) and infrastructure-as-code deployments.

Preferred Qualifications :

- Experience with Databricks and Snowflake solutions, including developing on lakehouse architectures and optimizing warehouse performance.

- Exposure to annotation platforms, ML workflows, or model validation pipelines.

- Experience with observability tools (Prometheus, Grafana,OpenTelemetry).

- Knowledge of data governance, RBAC, and compliance in large-scale analytics environments.

- Comfort working in Agile, distributed teams with Git, JIRA, and Slack.

Why Join Us?

At iMerit,you will lead a team at the cutting edge of AI data infrastructurebuilding and evolving platforms that are explainable, auditable, and scalable.

You will play a key role in upgrading and maintaining our streaming data lake and transforming it into analytics-ready warehouses, directly shaping how AI systems are built and trusted at scale.