Omnicrobe, an open-access database of microbial habitats, phenotypes and uses extracted from text
Résumé
The drastic increase in microbe descriptions, habitats, phenotypes and uses in databases, reports and papers presents a twofold challenge for the access to the information. The integration of heterogeneous data requires a standardized representation and the normalization of textual descriptions by semantic analysis. Recent information extraction technologies from the text mining domain offer a powerful way to detect and structure textual information along ontology-based representations.
The Omnicrobe application (https://omnicrobe.migale.inrae.fr) uses an Information Extraction workflow to populate its database. The workflow is designed to (1) extract microorganism taxa, their habitats, their phenotypes and their uses and (2) categorize the extracted information with taxa from the NCBI (National Center for Biotechnology Information) taxonomy and concepts from the OntoBiotope ontology. The Omnicrobe database contains around 1 million descriptions of microbe properties that are created by analyzing and combining six information sources, i.e. biological resource catalogues (e. g. INRAE CIRM, DSMZ through BacDive), sequence database (GenBank) and scientific literature (PubMed abstracts).
Omnicrobe offers powerful ways to express simple and complex ontology-based queries to support studies in various domains of microbiology. Omnicrobe also exposes an API (Application Programming Interface) that allows users to automatically integrate microbe biodiversity knowledge in external information systems. The use of Omnicrobe to quickly target useful strains in a food innovation application illustrates how it can provide an easy-to-use support in the resolution of scientific questions related to the habitats, phenotypes and uses of microbes.
Origine | Fichiers produits par l'(les) auteur(s) |
---|