The performance of handwriting recognition systems is dependent on the features extracted from the word image. A large body of features exists in the literature, but no method has yet been proposed to identify the most promising of these, other than a straightforward comparison based on the recognition rate. In this paper, we propose a framework for feature set evaluation based on a collaborative setting. We use a weighted vote combination of recurrent neural network (RNN) classifiers, each trained with a particular feature set. This combination is modeled in a probabilistic framework as a mixture model and two methods for weight estimation are described. The main contribution of this paper is to quantify the importance of feature sets through the combination weights, which reflect their strength and complementarity. We chose the RNN classifier because of its state-of-the-art performance. Also, we provide the first feature set benchmark for this classifier. We evaluated several feature sets on the IFN/ENIT and RIMES databases of Arabic and Latin script, respectively. The resulting combination model is competitive with state-of-the-art systems.