Abstract Arabic words have a rich structure. They are made of subwords (groups of connected letters) and diacritical marks (dots). This paper proposes a sparse descriptor specifically designed for lexicon reduction in handwritten Arabic documents. The topological and geometrical features of subwords are extracted from the skeleton image, based on the concept of local density. The sparse descriptor is then formed as a 3-bins histogram, describing the distribution of the skeleton pixels’ local density (low, medium or high). This descriptor is then extended to the Arabic word descriptor (AWD), which combines information from all the subwords and diacritics of an Arabic word. This approach is easy to implement and has only one free parameter. It has been evaluated on the Ibn Sina and IFN/ENIT databases with promising results.
Sparse Descriptor for Lexicon Reduction in Handwritten Arabic Documents
About the Author: admin