XML is becoming increasingly popular as a language for representing many types of electronic documents. The consequence of the strict structural document description via XML is that a relatively new task in mining documents based on structural and/or content information has emerged. In this paper we investigate (1) the suitability of new unsupervised machine learning methods for the clustering task of XML documents, and (2) the importance of contextual information for the same task. These tasks are part of an international competition on XML clustering and categorization (INEX 2006). It will be shown that the proposed approaches provide a suitable tool for the clustering of structured data as they yield the best results in the international INEX 2006 competition on clustering of XML data.
Kc, M., Hagenbuchner, M., Tsoi, A.C., Scarselli, F., Sperduti, A., Gori, M. (2006). XML document mining using contextual self-organizing maps for structures. In Comparative Evaluation of XML Information Retrieval Systems, (pp.510-524). Berlin Heidelberg : Springer [10.1007/978-3-540-73888-6_47].
XML document mining using contextual self-organizing maps for structures
SCARSELLI, F.;GORI, M.
2006-01-01
Abstract
XML is becoming increasingly popular as a language for representing many types of electronic documents. The consequence of the strict structural document description via XML is that a relatively new task in mining documents based on structural and/or content information has emerged. In this paper we investigate (1) the suitability of new unsupervised machine learning methods for the clustering task of XML documents, and (2) the importance of contextual information for the same task. These tasks are part of an international competition on XML clustering and categorization (INEX 2006). It will be shown that the proposed approaches provide a suitable tool for the clustering of structured data as they yield the best results in the international INEX 2006 competition on clustering of XML data.File | Dimensione | Formato | |
---|---|---|---|
INEX2006 SOM.pdf
non disponibili
Tipologia:
PDF editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
9.53 MB
Formato
Adobe PDF
|
9.53 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/42976
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo