Knowledge-Based Representation for Transductive Multilingual Document Classification

Salvatore Romeo; Dino Ienco; Andrea Tagarelli

doi:10.1007/978-3-319-16354-3_11

Communication Dans Un Congrès Année : 2015

Knowledge-Based Representation for Transductive Multilingual Document Classification

Représentation à base de connaissance pour une méthode de classification transductive de document multilangue

(1) , (2, 3) , (1)

1
2
3

Salvatore Romeo

Fonction : Auteur

Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica [Calabria]

Dino Ienco

Fonction : Auteur
PersonId : 6226
IdHAL : dino-ienco
ORCID : 0000-0002-8736-3132
IdRef : 172688183

ADVanced Analytics for data SciencE

Territoires, Environnement, Télédétection et Information Spatiale

Andrea Tagarelli

Fonction : Auteur
PersonId : 973596

Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica [Calabria]

Résumé

Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-the-art transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

Mots clés

Knowledge-base Multilingual classification Transductive learning

Domaines

Recherche d'information [cs.IR] Apprentissage [cs.LG] Base de données [cs.DB]

Fichier principal

paper_169.pdf (377.65 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Dino Ienco : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01239095

Soumis le : lundi 7 décembre 2015-14:00:48

Dernière modification le : mardi 12 mars 2024-10:44:09

Archivage à long terme le : samedi 29 avril 2017-09:44:37

Dates et versions

lirmm-01239095 , version 1 (07-12-2015)

Identifiants

HAL Id : lirmm-01239095 , version 1
DOI : 10.1007/978-3-319-16354-3_11
IRSTEA : PUB00045750

Citer

Salvatore Romeo, Dino Ienco, Andrea Tagarelli. Knowledge-Based Representation for Transductive Multilingual Document Classification. 37th European Conference on Information Retrieval (ECIR), Mar 2015, Vienna, Austria. pp.92-103, ⟨10.1007/978-3-319-16354-3_11⟩. ⟨lirmm-01239095⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH CNRS IRSTEA ADVANSE LIRMM TETIS MIPS UNIV-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

391 Consultations

471 Téléchargements

Knowledge-Based Representation for Transductive Multilingual Document Classification

Représentation à base de connaissance pour une méthode de classification transductive de document multilangue

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager