Training LS-SVM Classifier in semi-supervised mode

The learning process typically assumes some form of a priori knowledge of the contextual problem at hand in the form of examplar data associated with labels. These data, called the training set, are used to design a classifier, the performance of which is measured on a separate dataset, called the testing set. This is supervised learning in which the performance of the classifier on the test set is viewed as an estimate of the true performance of the system (i.e. the performance on the whole space). The accuracy of this approach assumes that the training set is representative of the whole space and that the data labels represent the ground truth, both conditions being necessary for a robust estimate of the true performance of the system.

However, not only could the labeling process be extremely expensive and cumbersome, but human error occurring during the labeling process could lead to unpredictable results. For instance, the labeling of handwritten documents, images, or web pages requires both human expertise and insight, while in the field of medicine or biology, testing and very complex experiments are sometimes needed. Thus, it may be very difficult, or even impossible, to label all the available data. The alternative is semi-supervised learning where both labeled and unlabeled data are used to train the classifier.

The least squares SVM(LS-SVM), like the SVM, is based on the margin-maximization performing structural risk and has excellent power of generalization. In this work, we consider its use in semi-supervised learning. We propose two algorithms to perform this task deduced from the transductive SVM idea (see figure 1).

Boundary found by Inductive learning with SVM classifier.Only labeled points (+) and (o)  are used for training.
Boundary found by Inductive learning with SVM classifier.Only labeled points (+) and (o) are used for training.
Boundary found by Transductive learning with SVM classifier. All points, labeled and testing(unlabeled), are used for training. Good classification was obtained on the testing samples.
Boundary found by Transductive learning with SVM classifier. All points, labeled and testing(unlabeled), are used for training. Good classification was obtained on the testing samples.
Figure1: Illustration of the Transductive accuracy in a two-dimensional input space

Algorithm 1 is based on combinatorial search guided by certain heuristics while Algorithm 2 iteratively builds the decision function by adding one unlabeled sample at the time. In term of complexity, Algorithm 1 is faster but Algorithm 2 yields a classifier with a better generalization capacity with only a few labeled data available. Our proposed algorithms are tested in several benchmarks and give encouraging results, confirming our approach. Below, we can see a demonstration on the 2-Moons dataset which is a standard benchmark for semi-supervised learning algorithms.

You are missing some Flash content that should appear here! Perhaps your browser cannot display it, or maybe it did not initialize correctly.

Demonstration of Algorithm 2 on the 2-Moons problem: At the begin, + and o are two labeled samples and . represents the unlabeled samples. Iteratively, the set of unlabeled is reduced until we found the optimal classification boundary.

For more information, we refer the reader to [1].


References

ericssonlogo
inocybelogo
canalogo
cienalogo
Civimetrix Telecom logo
mitacslogo
risq logo
nserclogo
promptlogo
ecolepolytechniquelogo
University of Torontologo
frqntlogo
uqlogo
MDEIE logo
cfilogo
ciraiglogo