Skip to Main content Skip to Navigation
Conference papers

XML Document Classification using SVM

Abstract : This paper describes a representation for XML documents in order to classify them. Document classification is based on document representation techniques. How relevant the representation phase is, the more relevant the classification will be. We propose a representation model that exploits both the structure and the content of document. Our approach is based on vector space model: a document is represented by a vector of weighted features. Each feature is a couple of (tag, term). We have expanded tf*idf to calculate feature's weight according to term's structural level in the document. SVM has been used as learning algorithm. Experimentation on Reuters collection shows that our proposition improves classification performance compared to the standard classification model based on term vector.
Document type :
Conference papers
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download
Contributor : Import Ws Irstea Connect in order to contact the contributor
Submitted on : Thursday, April 14, 2011 - 11:00:46 AM
Last modification on : Monday, June 27, 2022 - 11:32:50 AM
Long-term archiving on: : Friday, July 15, 2011 - 2:40:06 AM


Files produced by the author(s)


  • HAL Id : hal-00585914, version 1
  • IRSTEA : PUB00029029


Samaneh Chagheri, Catherine Roussey, Sylvie Calabretto, Cyril Dumoulin. XML Document Classification using SVM. SFC'2010 (Société Francophone de Classification), Jun 2010, Saint Denis de la Réunion, France. pp.71-74. ⟨hal-00585914⟩



Record views


Files downloads