Posted on: 18/11/2025
Description:
About the Role :
We are looking for a Data Engineer Validation & Quality to ensure that every dataset inside Perceive Now is verifiable, accurate, and audit-traceable. You will architect quantitative validation frameworks, build contradiction and anomaly detection systems, and integrate automated evidence scoring mechanisms into our 25-layer data reasoning pipeline.
Responsibilities :
- Design and implement validation frameworks using Python (Pandas, NumPy, Polars) for data quality enforcement, schema validation, and field-level consistency checks.
- Build contradiction-detection and reconciliation pipelines leveraging rule-based systems, cosine similarity, and statistical control models.
- Develop automated confidence scoring models for each record or Evidence Bundle, integrating factors like source reliability, freshness, and duplication metrics.
- Orchestrate validation jobs through Temporal / Airflow / Prefect, ensuring deterministic execution and full observability.
- Automate checksum verification, schema drift detection, and data sampling across hundreds of data sources.
- Create and maintain lineage graphs and quality dashboards in PostgreSQL, OpenSearch, and Grafana for continuous visibility.
- Collaborate with Kernel and Governance pods to embed validation metadata and scoring outputs directly into evidence objects.
- Ensure compliance with enterprise-grade data governance and security frameworks (SOC 2, GDPR, ISO 27001).
Required Qualifications :
- 5+ years of experience in data engineering, MLOps validation, or data quality automation.
- Strong expertise in Python (Pandas, NumPy, Polars), SQL, and ETL optimization.
- Proficiency in PostgreSQL query optimization, window functions, and materialized views for performance tuning.
- Experience designing data lineage and reconciliation frameworks using audit tables or time-versioned stores.
- Hands-on with Airflow / Prefect / Temporal for scheduled and event-driven pipelines.
- Working knowledge of OpenTelemetry, Prometheus, and Grafana for pipeline observability.
Preferred Skills :
- Familiarity with Data Quality (DQ) frameworks like Great Expectations / Soda Core.
- Experience integrating checksum, PII masking, and encryption verification layers.
- Understanding of semantic versioning, schema registry systems, and data governance catalogs (e.g., OpenMetadata, Amundsen).
Key Performance Metrics :
- Validation Accuracy ? 99 %
- Schema Drift Detection Time < 10 min
- False Positive Rate in Contradiction Detection < 2 %
- 100 % Coverage of data lineage and confidence scoring
The job is for:
Did you find something suspicious?
Posted By
Posted in
Data Engineering
Functional Area
ML / DL Engineering
Job Code
1576534
Interview Questions for you
View All