The learning process typically assumes some form of a priori knowledge of the contextual problem at hand in the form of examplar data associated with labels. These data, called the training set, are used to design a classifier, the performance of which is measured on a separate dataset, called the testing set. This is supervised learning in which the performance of the classifier on the test set is viewed as an estimate of the true performance of the system (i.e. the performance on the whole space). The accuracy of this approach assumes that the training set is representative of the whole space and that the data labels represent the ground truth, both conditions being necessary for a robust estimate of the true performance of the system.

However, not only could the labeling process be extremely expensive and cumbersome, but human error occurring during the labeling process could lead to unpredictable results. For instance, the labeling of handwritten documents, images, or web pages requires both human expertise and insight, while in the field of medicine or biology, testing and very complex experiments are sometimes needed. Thus, it may be very difficult, or even impossible, to label all the available data. The alternative is semi-supervised learning where both labeled and unlabeled data are used to train the classifier.

The least squares SVM(LS-SVM), like the SVM, is based on the margin-maximization performing structural risk and has excellent power of generalization. In this work, we consider its use in semi-supervised learning. We propose two algorithms to perform this task deduced from the transductive SVM idea.

Algorithm 1 is based on combinatorial search guided by certain heuristics while Algorithm 2 iteratively builds the decision function by adding one unlabeled sample at the time. In term of complexity, Algorithm 1 is faster but Algorithm 2 yields a classifier with a better generalization capacity with only a few labeled data available. Our proposed algorithms are tested in several benchmarks and give encouraging results, confirming our approach. Below, we can see a demonstration on the 2-Moons dataset which is a standard benchmark for semi-supervised learning algorithms.

References

  1. Adankon2009] Adankon MMCheriet MBiem A. “Semi-Supervised Learning using Bayesian Interpretation: Application to LS-SVM.” IEEE Transactions on Neural Networks. 2011;22(4):513-524.