Abstract

This work addresses three important yet challenging problems of handwritten text understanding: word recognition, query-by-example (QBE) word spotting and query-by-string (QBS) word spotting. In most existing approaches, these related tasks are considered independently. We propose a single unified framework based on deep learning to solve all three tasks efficiently and simultaneously. In this framework, an end-to-end deep neural network architecture is used for the joint embedding of handwritten word texts and images. Word images are embedded via a convolution neural network (CNN), which is trained to predict a representation modeling character-level information. The output of the last convolutional layer is considered as representation in the joint embedding subspace. Likewise, a recurrent neural network (RNN) is used to map a sequence of characters to the joint subspace representation. Finally, a model based on multi-layer perceptrons is proposed to predict the matching probability between two embedding vectors. Experiments on five databases of documents written in three languages show our method to yield state-of-the-art performance for QBE and QBS word spotting. The proposed method also obtains competitive results for word recognition, when compared against approaches tailored specifically for this task.