Active Learning Challenge 2010

Active Learning Challenge 2010, as a part of Pascal2 Challenge Program,targeted pool-based active learning in which a large unlabeled dataset is available from the onset of the challenge and the participants can place queries to acquire data for some amount of virtual cash. The participants will need to return prediction values for all the labels every time they want to purchase new labels. This will allow us to draw learning curves prediction performance vs. amount of virtual cash spend. The participants will be judged according to the area under the learning curves, forcing them to optimize both efficacy (obtain good prediction performance) and efficiency (spend little virtual cash).

Much of machine learning and data mining has been so far concentrating on analyzing data already collected, rather than collecting data. While experimental design is a well-developed discipline of statistics, data collection practitioners often neglect to apply its principled methods. As a result, data collected and made available to data analysts, in charge of explaining them and building predictive models, are not always of good quality and are plagued by experimental artifacts. In reaction to this situation, some researchers in machine learning and data mining have started to become interested in experimental design to close the gap between data acquisition or experimentation and model building. This has given rise of the discipline of active learning. In parallel, researchers in causal studies have started raising the awareness of the differences between passive observations, active sampling, and interventions. In this domain, only interventions qualify as true experiments capable of unraveling cause-effect relationships. However, most practical experimental designs start with sampling data in a way to minimize the number of necessary interventions.

The IBN SINA database has been used in this contest:

The results of this challenge are published as the following papers:

1. Results of the Active Learning Challenge (3).

2. Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing (7).

3. Active Learning and Experimental Design with SVMs (10).

They are also reported in Active Learning and Experimental Design (AISTATS'10) and Active and Autonomous Learning (WCCI'10).

Civimetrix Telecom logo
risq logo
University of Torontologo
MDEIE logo