Ontology based information retrieval
Utilisation d'ontologies comme support à la recherche et à la navigation dans une collection de documents
Résumé
Domain ontologies provide a knowledge model where the main concepts of a domain are organized through hierarchical relationships. In conceptual Information Retrieval Systems (IRS), where they are used to index documents as well as to formulate a query, their use allows to overcome some ambiguities of classical IRSs based on natural language processes.One of the contributions of this study consists in the use of ontologies within IRSs, in particular to assess the relevance of documents with respect to a given query. For this matching process, a simple and intuitive aggregation approach is proposed, that incorporates user dependent preferences model on one hand, and semantic similarity measures attached to a domain ontology on the other hand. This matching strategy allows justifying the relevance of the results to the user. To complete this explanation, semantic maps are built, to help the user to grasp the results at a glance. Documents are displayed as icons that detail their elementary scores. They are organized so that their graphical distance on the map reflects their relevance to a query represented as a probe. As Information Retrieval is an iterative process, it is necessary to involve the users in the control loop of the results relevancy in order to better specify their information needs. Inspired by experienced strategies in vector models, we propose, in the context of conceptual IRS, to formalize ontology based relevance feedback. This strategy consists in searching a conceptual query that optimizes a tradeoff between relevant documents closeness and irrelevant documents remoteness, modeled through an objective function. From a set of concepts of interest, a heuristic is proposed that efficiently builds a near optimal query. This heuristic relies on two simple properties of semantic similarities that are proved to ensure semantic neighborhood connectivity. Hence, only an excerpt of the ontology dag structure is explored during query reformulation.These approaches have been implemented in OBIRS, our ontological based IRS and validated in two ways: automatic assessment based on standard collections of tests, and case studies involving experts from biomedical domain.