ITEXT-BIO: Intelligent Term EXTraction for BIOmedical Analysis - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Journal Articles Health Information Science and Systems Year : 2021

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical Analysis


Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain.
Fichier principal
Vignette du fichier
Kafando2021_Article_ITEXT-BIOIntelligentTermEXTrac.pdf (2.28 Mo) Télécharger le fichier
Origin : Publisher files allowed on an open archive

Dates and versions

hal-03283040 , version 1 (09-07-2021)
hal-03283040 , version 2 (25-08-2021)





Rodrique Kafando, Rémy Decoupes, Sarah Valentin, Lucile Sautot, Maguelonne Teisseire, et al.. ITEXT-BIO: Intelligent Term EXTraction for BIOmedical Analysis. Health Information Science and Systems, 2021, 9 (1), pp.29. ⟨10.1007/s13755-021-00156-6⟩. ⟨hal-03283040v2⟩
80 View
44 Download



Gmail Facebook X LinkedIn More