Abstract A novel system for word spotting in old Arabic manuscripts is developed. The system has a complete chain of operations and consists of three major steps: pre-processing, data preparation, and word spotting. In the pre-processing step, using multi-level classifiers, clean binarization is obtained from the input degraded document images. In the second step, the smallest units of data, i.e., the connected components, are processed and clustered in a robust way in a library, based on features which have been extracted from their skeletons. The preprocessed data are ready to be used in the final and third step, in which occurrences of queries are located within the manuscript. Various techniques are used to improve the performance and to cope with possible inaccuracies in data and representation. The latter techniques have been developed in an integrated collaboration with scholars, especially for relaxing the system to absorb various scripts. The system is tested on an old manuscript with promising results.
A Robust Word Spotting System for Historical Arabic Manuscripts
About the Author: admin