Skip to Main content Skip to Navigation
Journal articles

A linear programming-based framework for handling missing data in multi-granular data warehouses

Abstract : Data Warehouse (DW) and OLAP systems are first citizens of Business Intelligence tools. They are widely used in the academic and industrial communities for numerous different fields of application. Despite the maturity of DW and OLAP systems, with the advent of Big Data, more and more sources of data are available, and warehousing this data can lead to important quality issues. In this work, we focus on missing numerical and categorical in presence of aggregated facts. Motivated by the lack of a formal approach for the imputation of this kind of data taking into account all type of aggregation functions (distributive, algebraic and holistic), we propose an new methodology based on linear programming. Our methodology allows dealing with the relaxed constraints over classical SQL aggregation functions. The proposed approach is tested on two well-known datasets. Experiments show the effectiveness of the proposed approach.
Keywords : OLAP Analysis
Document type :
Journal articles
Complete list of metadata

https://hal.inrae.fr/hal-03203605
Contributor : Kareen Louembe <>
Submitted on : Tuesday, April 20, 2021 - 9:26:26 PM
Last modification on : Wednesday, April 21, 2021 - 3:37:04 AM

Identifiers

Collections

Citation

Sandro Bimonte, Libo Ren, Nestor Koueya. A linear programming-based framework for handling missing data in multi-granular data warehouses. Data and Knowledge Engineering, Elsevier, 2020, 128, pp.101832. ⟨10.1016/j.datak.2020.101832⟩. ⟨hal-03203605⟩

Share

Metrics

Record views

18