Toward Generic Abstractions for Data of Any Model - Département d'informatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Toward Generic Abstractions for Data of Any Model

Résumé

Digital data sharing leads to unprecedented opportunities to develop data-driven systems for supporting economic activities, the social and political life, and science. Many open-access datasets are RDF graphs, but others are CSV files, Neo4J property graphs, JSON or XML documents, etc. Potential users need to understand a dataset in order to decide if it is useful for their goal. While some datasets come with a schema and/or documentation, this is not always the case. Data summarization or schema inference tools have been proposed, specializing in XML, or JSON, or the RDF data models. In this work, we present a dataset abstraction approach, which () applies on relational, CSV, XML, JSON, RDF or Property Graph data; () computes an abstraction meant for humans (as opposed to a schema meant for a parser); () integrates Information Extraction data profiling, to also classify dataset content among a set of categories of interest to the user. Our abstractions are conceptually close to an Entity-Relationship diagram, if one allows nested and possibly heterogeneous structure within entities.
Fichier principal
Vignette du fichier
Submission-41-Barret-Manolescu-Upadhyay.pdf (1.41 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03344041 , version 1 (14-09-2021)
hal-03344041 , version 2 (14-09-2021)

Identifiants

  • HAL Id : hal-03344041 , version 2

Citer

Nelly Barret, Ioana Manolescu, Prajna Upadhyay. Toward Generic Abstractions for Data of Any Model. BDA 2021 - Informal publication only, Oct 2021, Paris, France. ⟨hal-03344041v2⟩
209 Consultations
218 Téléchargements

Partager

Gmail Facebook X LinkedIn More