Several problems in bioinformatics and cheminformatics concern the classification of molecules. Relevant instances are automatic cancer detection/classification, machine-learning pathologic prediction, automatic predictive toxicology, etc. Molecules may be represented in terms of graphical structures in a natural way: each node in the graph can be used to represent an atom, whilst the edges of the graph represent the atom-atom bonds. Labels (in the form of real-valued vectors) are associated with nodes and edges in order to express physical and chemical properties of the corresponding atoms and bonds, respectively. These structured data are expected to contain more information than a traditional (flat) feature vector, information that may strengthen the classification capabilities of a machine learner. This paper investigates the application of a novel Bayesian/connectionist classifier to this graphical pattern recognition task. The approach is much simpler than state-of-the-art machine learning paradigms for graphical/relational learning. It relies on the idea of describing the graph in terms of a binary relation. The posterior probability of a class given the relation is estimated as a function of probabilistic quantities modeled with a neural network, trained over individual vertex pairs in the graph. The popular and challenging Mutagenesis dataset is considered for the experimental evaluation. Despite its simplicity, the technique turns out to yield the highest recognition accuracies to date on the complete (friendly + unfriendly) dataset, outperforming complex machines (relational and graph neural nets, kernels for graphs, inductive logic programming techniques, etc.). Some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained.
Trentin, E., E., D.I. (2008). Classification of Molecular Structures Made Easy. In Proceedings of the INNS-IEEE International Joint Conference on Neural Networks (IJCNN08). (pp.3241-3246) [10.1109/IJCNN.2008.4634258].
Classification of Molecular Structures Made Easy
TRENTIN, EDMONDO;
2008-01-01
Abstract
Several problems in bioinformatics and cheminformatics concern the classification of molecules. Relevant instances are automatic cancer detection/classification, machine-learning pathologic prediction, automatic predictive toxicology, etc. Molecules may be represented in terms of graphical structures in a natural way: each node in the graph can be used to represent an atom, whilst the edges of the graph represent the atom-atom bonds. Labels (in the form of real-valued vectors) are associated with nodes and edges in order to express physical and chemical properties of the corresponding atoms and bonds, respectively. These structured data are expected to contain more information than a traditional (flat) feature vector, information that may strengthen the classification capabilities of a machine learner. This paper investigates the application of a novel Bayesian/connectionist classifier to this graphical pattern recognition task. The approach is much simpler than state-of-the-art machine learning paradigms for graphical/relational learning. It relies on the idea of describing the graph in terms of a binary relation. The posterior probability of a class given the relation is estimated as a function of probabilistic quantities modeled with a neural network, trained over individual vertex pairs in the graph. The popular and challenging Mutagenesis dataset is considered for the experimental evaluation. Despite its simplicity, the technique turns out to yield the highest recognition accuracies to date on the complete (friendly + unfriendly) dataset, outperforming complex machines (relational and graph neural nets, kernels for graphs, inductive logic programming techniques, etc.). Some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/22317