Databases and Contests

Persian Heritage Image Binarization Dataset (PHIBD 2012)

PHIBD is the first groundtruthed Persian Heritage Image Binarization Dataset developed using an efficient ground thruthing tool called "PhaseGT" [1]. The PHIBD 2012 contains 15 historical document images with their corresponding ground truth binary images. The historical images in the dataset suffer from various types of degradation. It has been also divided into two subsets of training and testing images for those binarization method that use learning approaches. For more information, please visit the IAPR-TC11 website:

The SIGN On-Line Database


The Synchromedia-Imadoc Gesture New On-Line Database (SIGN-OnDB) contains data corresponding to on-line handwritten gestures. It can be used to train and test gesture recognition systems, used in applications associating specific gestures to edit functions, for example (like copying digital ink elements).
The data was acquired on Tablet PCs and whiteboards.

IBN SINA Ext Database (with sub-word images)

IBN SINA Ext database is an extension to IBN SINA database, which was published earlier as part of Active Learning Challenge 2010, and reported in DAS'10. In the extended database, more data, including the IMAGES of sub-words are available. Please see the guide for more detail.

Direct link to database:

1. Just with binarized images (smaller file, 17MB): download (176)

IBN SINA Database

Database Name: IBN SINA

Manuscript Title: Kitab Kashf al-tamwihat fi sharh al-Tanbīhāt (fol.1a) (Commentary on the Persian philosopher Ibn Sina's al-Isharat wa-al-tanbihat)
Author: Abu al-Hasan Ali ibn Abi Ali ibn Muhammad al-Amidi (d.641/1243 or 631/1233)

Year: Before 641/1243

Database size: Feature vectors of 20,722 shapes (connected components).


Avecinna Database

This database is built on a complete manuscript on the Persian philosopher Ibn Sina' work, containing 300 pages:


1) Bigger dataset for unsupervised learning.

2) Feature vector of 123,007 shapes.
3) Verification by the experts (McGill ISI)
4) Link  on IJCNN 2011 (Unsupervised and Transfer Learning Challenge): 

Active Learning Challenge 2010

Active Learning Challenge 2010, as a part of Pascal2 Challenge Program,targeted pool-based active learning in which a large unlabeled dataset is available from the onset of the challenge and the participants can place queries to acquire data for some amount of virtual cash. The participants will need to return prediction values for all the labels every time they want to purchase new labels. This will allow us to draw learning curves prediction performance vs. amount of virtual cash spend.

Unsupervised and Transfer Learning Challenge 2011

Unsupervised and Transfer Learning Challenge 2011 targeted classification problems, which are found in many application domains, including in pattern recognition (classification of images or videos, speech recognition), medical diagnosis, marketing (customer categorization), and text categorization (filtering of spam), using unsupervised and transfer learning algorithms.

Syndicate content
Civimetrix Telecom logo
risq logo
University of Torontologo
MDEIE logo