This paper proposes a system for automatically categorizing terms or lexical entities into a predefined set of semantic domains. We present an approach that exploits the knowledge available in the Web to create a model of each term or entity (entity context lexicons - ECLs). Each profile is simply a list of terms (similar to the bag-of-words representation in text categorization) and it is composed primarily by the words often appearing in the same contexts of the entity. These profiles model the contexts in which the entity usually appears and they can be subsequently processed by an automatic classifier. Moreover, we propose and validate a profile-based categorization model developed for this particular task which uses the ECLs of the training entities to build a profile for each class (class context lexicon - CCL). Finally, we propose a technique for dealing with multi-label classification based on a decision module that exploits a neural network. We show the effectiveness of the proposed approach on a term categorization task using a standard benchmark composed of a set of domain-specific lexicons (WordNetDomains).
Rigutini, L., DI IORIO, E., Ernandes, M., Maggini, M. (2006). Semantic labelling of data using the web. In Proceedings of the International Workshop on Technologies and Applications on Knowledge Computing on the Web at the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006) (pp.638-641) [10.1109/WI-IATW.2006.118].
Semantic labelling of data using the web
RIGUTINI, LEONARDO;DI IORIO, ERNESTO;ERNANDES, MARCO;MAGGINI, MARCO
2006-01-01
Abstract
This paper proposes a system for automatically categorizing terms or lexical entities into a predefined set of semantic domains. We present an approach that exploits the knowledge available in the Web to create a model of each term or entity (entity context lexicons - ECLs). Each profile is simply a list of terms (similar to the bag-of-words representation in text categorization) and it is composed primarily by the words often appearing in the same contexts of the entity. These profiles model the contexts in which the entity usually appears and they can be subsequently processed by an automatic classifier. Moreover, we propose and validate a profile-based categorization model developed for this particular task which uses the ECLs of the training entities to build a profile for each class (class context lexicon - CCL). Finally, we propose a technique for dealing with multi-label classification based on a decision module that exploits a neural network. We show the effectiveness of the proposed approach on a term categorization task using a standard benchmark composed of a set of domain-specific lexicons (WordNetDomains).File | Dimensione | Formato | |
---|---|---|---|
WI06.pdf
non disponibili
Tipologia:
Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
534.88 kB
Formato
Adobe PDF
|
534.88 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/37015
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo