HamburgerMenu
hirist

Data Engineer - Validation & Quality

CYBER HR Consulting
Anywhere in India/Multiple Locations
3 - 7 Years

Posted on: 18/11/2025

Job Description

Description:

About the Role :

We are looking for a Data Engineer Validation & Quality to ensure that every dataset inside Perceive Now is verifiable, accurate, and audit-traceable. You will architect quantitative validation frameworks, build contradiction and anomaly detection systems, and integrate automated evidence scoring mechanisms into our 25-layer data reasoning pipeline.

Responsibilities :

- Design and implement validation frameworks using Python (Pandas, NumPy, Polars) for data quality enforcement, schema validation, and field-level consistency checks.

- Build contradiction-detection and reconciliation pipelines leveraging rule-based systems, cosine similarity, and statistical control models.

- Develop automated confidence scoring models for each record or Evidence Bundle, integrating factors like source reliability, freshness, and duplication metrics.

- Orchestrate validation jobs through Temporal / Airflow / Prefect, ensuring deterministic execution and full observability.

- Automate checksum verification, schema drift detection, and data sampling across hundreds of data sources.

- Create and maintain lineage graphs and quality dashboards in PostgreSQL, OpenSearch, and Grafana for continuous visibility.

- Collaborate with Kernel and Governance pods to embed validation metadata and scoring outputs directly into evidence objects.

- Ensure compliance with enterprise-grade data governance and security frameworks (SOC 2, GDPR, ISO 27001).

Required Qualifications :

- 5+ years of experience in data engineering, MLOps validation, or data quality automation.

- Strong expertise in Python (Pandas, NumPy, Polars), SQL, and ETL optimization.

- Proficiency in PostgreSQL query optimization, window functions, and materialized views for performance tuning.

- Experience designing data lineage and reconciliation frameworks using audit tables or time-versioned stores.

- Hands-on with Airflow / Prefect / Temporal for scheduled and event-driven pipelines.

- Working knowledge of OpenTelemetry, Prometheus, and Grafana for pipeline observability.

Preferred Skills :

- Familiarity with Data Quality (DQ) frameworks like Great Expectations / Soda Core.

- Experience integrating checksum, PII masking, and encryption verification layers.

- Understanding of semantic versioning, schema registry systems, and data governance catalogs (e.g., OpenMetadata, Amundsen).

Key Performance Metrics :

- Validation Accuracy ? 99 %

- Schema Drift Detection Time < 10 min

- False Positive Rate in Contradiction Detection < 2 %

- 100 % Coverage of data lineage and confidence scoring


The job is for:

May work from home
info-icon

Did you find something suspicious?