Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.

Maggini, M., Rigutini, L., Turchi, M. (2004). Pseudo-Supervised Clustering for Text Documents. In Proceedings of the IEEE/ACM/WI International Conference on Web Intelligence 2004 (pp.363-369) [10.1109/WI.2004.10138].

Pseudo-Supervised Clustering for Text Documents

MAGGINI, MARCO;RIGUTINI, LEONARDO;TURCHI, MARCO
2004-01-01

Abstract

Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.
2004
0769521002
Maggini, M., Rigutini, L., Turchi, M. (2004). Pseudo-Supervised Clustering for Text Documents. In Proceedings of the IEEE/ACM/WI International Conference on Web Intelligence 2004 (pp.363-369) [10.1109/WI.2004.10138].
File in questo prodotto:
File Dimensione Formato  
WI04a.pdf

non disponibili

Tipologia: Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 239.66 kB
Formato Adobe PDF
239.66 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/38700
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo