Ontologies and information extraction
Résumé
An ontology is a description of conceptual knowledge organized in a computer-based representation while information extraction (IE) is a method for analyzing texts expressing facts in natural language and extracting relevant pieces of information from these texts. IE and ontologies are involved in two main and related tasks, • Ontology is used for Information Extraction: IE needs ontologies as part of the understanding process for extracting the relevant information; • Information Extraction is used for populating and enhancing the ontology: texts are useful sources of knowledge to design and enrich ontologies. These two tasks are combined in a cyclic process: ontologies are used for inter- preting the text at the right level for IE to be efficient and IE extracts new knowl- edge from the text, to be integrated in the ontology. We will argue that even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. We will show that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. Extracting information from texts calls for lexical knowledge, grammars describing the specific syntax of the texts to be analyzed, as well as semantic and ontological knowledge. In this chapter, we will not take part in the debate about the limit between lexicon and ontology as a conceptual model. We will rather focus on the role that ontologies viewed as semantic knowledge bases could play in IE. The ontologies that can be used for and enriched by IE relate conceptual knowl- edge to its linguistic realizations (e.g. a concept must be associated with the terms that express it, eventually in various languages). Interpreting text factual information also calls for knowledge on the domain referential entities that we consider as part of the ontology (Sect. 2.2.1).