Discovering Denial Constraints in Dynamic Datasets

Authors: Eduardo H. M. Pena; Fabio Porto; Felix Naumann
Published: 16-05-2024
Abstract:
Denial constraints (DCs) are data dependencies with high expressive power, offering great flexibility for modeling data quality rules. Specifying DCs manually is problematic, as the required domain expertise is expensive and scarce. Moreover, database updates can invalidate DCs thought to hold and simultaneously uncover new DCs. This fact leads to burdensome scenarios where experts must often revisit DC specifications. Several algorithms have been devised to discover DCs from data, among which only one considers DC discovery on data updates. However, that solution underperforms in many scenarios due to long runtime and excessive memory use. Also, it targets database inserts only, so no previous solution covers deletions. This paper proposes an efficient and flexible algorithm that covers the earlier limitations regarding performance and scope. The algorithm maintains small-footprint intermediate structures during database updates and a method that exploits the changes in this intermediate to update the DCs incrementally. The results of our extensive experimental evaluation show that our algorithm is orders of magnitude faster than the existing one, with much better scalability in the size of the data updates.

DEXL Members

Participant Institutions

Paper Link

View Publication