Datenschutzerklärung|Data Privacy

Martin Pagel

The Paper "Towards Unsupervised Data Quality Validation on Dynamic Data" was Accepted for Presentation at ETMLP 2020

Towards Unsupervised Data Quality Validation on Dynamic Data. Sergey Redyuk, Volker Markl, Sebastian Schelter. To be Presented at the 46th International Workshop on Explainability for Trustworthy ML Pipelines (ETMLP), 30 March 2020, Copenhagen, Denmark. Co-located with EDBT 2020.

Validating the quality of data is crucial for establishing the trustworthiness of data pipelines. State-of-the-art solutions for data validation and error detection require explicit domain expertise (e.g., in the form of rules or patterns) or manually labeled examples. In real-world applications, domain knowledge is often incomplete, and data changes over time, which limits the applicability of existing solutions. We propose an unsupervised approach for detecting data quality degradation early and automatically. We will present the approach, its key assumptions, and preliminary results on public data to demonstrate how data quality can be monitored without manually curated rules and constraints.

You can find a preprint version of the paper here

To learn more about ETMLP 2020, please visit: