Integrating complex pangenome graphs

Graph databases are increasingly used to handle complex data pipelines, in which interconnected data is exploited for visualization and analytics. We propose a novel method, PanGraph-DB, for performing complex inter-pangenomic analysis within a graph database. As a case study, we focus on the antibiotic resistance in sequenced genomes. Over the past decade, the volumes of genomic data stored in public databases have grown exponentially, to the point of hindering comparative genomics algorithms. We show that, due to the nature of genomic data, graph databases enable accurate data and metadata analysis, visualization, and comparison across diverse genomes in the pangenomic approach. Families of graph-encoded pangenomes can then be integrated under a common mediated graph schema. The graph data integration allows to visualize and compare several pangenomes, as well as to analyze AntiMicrobial Resistance (AMR) gene niches through a combination of graph queries, whose performance and scalability we study.

Mots clés

graph databases graph queries pangenomics

Domaines

Bio-Informatique, Biologie Systémique [q-bio.QM] Base de données [cs.DB]

Guillaume GAUTREAU : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-04660364

Soumis le : mardi 23 juillet 2024-17:22:56

Dernière modification le : mardi 8 octobre 2024-18:40:53

Dates et versions

hal-04660364 , version 1 (23-07-2024)

Identifiants

HAL Id : hal-04660364 , version 1
DOI : 10.1109/ICDEW61823.2024.00052

Citer

Jérôme Arnoux, Angela Bonifati, A. Calteau, Stefania Dumbrava, Guillaume Gautreau. Integrating complex pangenome graphs. 2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW), May 2024, Utrecht, New Zealand. pp.350-354, ⟨10.1109/ICDEW61823.2024.00052⟩. ⟨hal-04660364⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA CNRS UNIV-EVRY TELECOM-SUDPARIS CEA-UPSAY GENOMIQUE-METABOLIQUE UNIV-PARIS-SACLAY JACOB CEA-DRF IP_PARIS GENOSCOPE INRAE GS-BIOSPHERA GS-LIFE-SCIENCES-HEALTH INSTITUT-MINES-TELECOM

33 Consultations

0 Téléchargements