Posted on: 22/01/2026
Description :
Key Responsibilities :
- Audit & Validation : Conduct rigorous quality checks on scraped outputs from Streamlit applications to ensure high-fidelity extraction from source documents.
- Data Remediation : Utilize purpose-built data pipelines to manually or programmatically overwrite inaccurate data points discovered during auditing.
- Pipeline Monitoring : Collaborate with data engineering teams to identify systemic scraping errors and refine the logic within the ingestion layer.
- Governance Integration : Transition successful document auditing workflows into our broader enterprise data governance practices.
- Reporting : Maintain detailed logs of data discrepancies, "ground truth" comparisons, and error trends to inform future scraping strategies.
Required Skills & Qualifications :
- Extreme Attention to Detail : You must have a passion for "hunting" for small discrepancies in large datasets.
- Snowflake Proficiency : Hands-on experience querying and managing data within Snowflake is required.
- Strong SQL Skills : Ability to write complex queries to validate data across multiple tables and identify outliers.
- Analytical Mindset : Experience auditing unstructured data (PDFs, images, or web scrapes) and comparing it against structured outputs.
- Communication : Ability to clearly document data issues and explain technical discrepancies to both engineers and stakeholders.
Preferred Qualifications :
- Python Experience : Familiarity with Python for data manipulation (Pandas) or basic automation is a significant plus.
- Streamlit Familiarity : Understanding how Streamlit apps function to better troubleshoot how data is being captured.
- Governance Background : Prior experience working within a formal Data Governance framework or using data cataloging tools
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1604613