About the Role :

We are seeking a skilled Data Engineer to help scale and enhance our internal data observability and analytics platform. This platform integrates with data annotation tools and ML pipelines to provide visibility, insights, and automation across large-scale data operations.

You will design and optimize robust data pipelines, build integrations with internal platforms (e.g., AngoHub, 3DPCT) and customer platforms, and support real-time metrics, dashboards, and workflows critical to customer delivery and operational excellence.

Key Responsibilities :

- Design and build scalable batch and real-time data pipelines across structured and unstructured sources.

- Integrate analytics and observability services with upstream annotation tools and downstream ML validation systems to enable full-cycle traceability.

- Collaborate with product, platform, and analytics teams to define event models, metrics, and data contracts.

- Develop ETL/ELT workflows using tools like AWS Glue, PySpark, or Airflow; ensure data quality, lineage, and reconciliation.

- Implement observability pipelines and alerts for mission-critical metrics (e.g., annotation throughput, quality KPIs, latency).

- Build data models and queries to power dashboards and insights via tools like Athena, QuickSight, or Redash.

- Contribute to infrastructure-as-code and CI/CD practices for deployment across cloud environments (preferably AWS).

- Document architecture, data flow, and support runbooks; continuously improve platform performance and resilience.

- Integrate with customer data platforms and pipelines, including bespoke data frameworks.

Minimum Qualifications :

- 5 - 8 years of experience in data engineering or backend development in data-intensive environments.

- Proficient in Python and SQL; familiarity with PySpark or other distributed processing frameworks.

- Strong experience with cloud-native data tools and services (S3, Lambda, Glue, Kinesis, Firehose, RDS).

- Familiarity with frameworks like Apache Hadoop, Apache Spark, and related tools for handling large datasets.

- Experience with data lake and warehouse patterns (e.g., Delta Lake, Redshift, Snowflake).

- Solid understanding of data modeling, schema design, and versioned datasets.

- Data Governance and Security : Understanding and implementing data governance policies and security measures.

- Proven experience in building resilient, production-grade pipelines and troubleshooting live systems.

- Working knowledge of messaging frameworks like Kafka, Firehose etc

- Working knowledge of API frameworks, robust and performant API design

- Good working knowledge of Database fundamentals, relational databases and SQL

.Why Join Us?

- iMerit was founded in 2012 and has been a leader in AI data solutions for over a decade, helping teams build and deploy machine learning and computer vision applications at scale.

- The companys global headquarters is in San Jose, California, and it operates across multiple regions including the US, India, Europe, and Bhutan.

- In addition to data services, iMerit is expanding its footprint in AI product innovation through its Ango Hub platform developed with the acquisition of Ango.ai reinforcing its position as a product-led AI technology company.