MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms - Laboratoire de Physiologie et Génomique des Poissons Inra UR 1037 Accéder directement au contenu
Article Dans Une Revue Data in Brief Année : 2024

MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms

Sophie Aubin

Résumé

There is a growing interest in milk oligosaccharides (MOs) because of their numerous benefits for newborns’ and long-term health. A large number of MO structures have been identified in mammalian milk. Mostly described in human milk, the oligosaccharide richness, although less broad, has also been reported for a wide range of mammalian species. The structure of MOs is particularly difficult to report as it results from the combination of 5 monosaccharides linked by various glycosidic bonds forming structurally diverse and complex matrices of linear and branched oligosaccharides. Exploring the literature and extracting relevant information on MO diversity within or across species appears promising to elucidate structure-function role of MOs. Currently, given the complexity of these molecules, the main issues in exploring literature to extract relevant information on MO diversity within or across species relate to the heterogeneity in the way authors refer to these molecules. Herein, we provide a thesaurus (MilkOligoThesaurus) including the names and synonyms of MOs collected from key selected articles on mammalian milk analyses. MilkOligoThesaurus gathers the names of the MOs with a complete description of their monosaccharide composition and structures. When available, each unique MO molecule is linked to its ID from the NCBI PubChem and ChEBI databases. MilkOligoThesaurus is provided in a tabular format. It gathers 245 unique oligosaccharide structures described by 22 features (columns) including the name of the molecule, its abbreviation, the chemical database IDs if available, the monosaccharide composition, chemical information (molecular formula, monoisotopic mass), synonyms, its formula in condensed form, and in abbreviated condensed form, the abbreviated systematic name, the systematic name, the isomer group, and scientific article sources. MilkOligoThesaurus is also provided in the SKOS (Simple Knowledge Organization System) format. This thesaurus is a valuable resource gathering MO naming variations that are not found elsewhere for (i) Text and Data Mining to enable automatic annotation and rapid extraction of milk oligosaccharide data from scientific papers; (ii) biology researchers aiming to search for or decipher the structure of milk oligosaccharides based on any of their names, abbreviations or monosaccharide compositions and linkages
Fichier principal
Vignette du fichier
2024_Rumeau DIB_ MilkOligoTheaurus.pdf (542.26 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
licence : CC BY NC - Paternité - Pas d'utilisation commerciale

Dates et versions

hal-04552648 , version 1 (19-04-2024)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

Citer

Mathilde Rumeau, François Fenaille, Agnès Girard, Valentin Loux, Mouhamadou Ba, et al.. MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms. Data in Brief, 2024, 54, pp.110404. ⟨10.1016/j.dib.2024.1104042352-3409/⟩. ⟨hal-04552648⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More