Multispectral (MS) imaging reveals latent content in historical documents by leveraging material-specific spectral signatures. Low-rank decompositions such as Nonnegative Matrix Factorization (NMF) effectively extract these components, but selecting the appropriate rank remains an open challenge in unsupervised settings. We propose PRISM, a structured autoencoder that embeds a tri-factor NMF into a convolutional architecture and integrates similarity-driven pruning to automatically infer the effective number of components during training. The encoder enforces spatial attention to produce coherent abundance maps, while the decoder reconstructs interpretable spectral signatures through constrained linear decoding. This formulation yields compact, interpretable representations with no supervision or manual rank tuning. A Minimum Description Length criterion estimates the optimal rank directly from the input cube, balancing model complexity with data reconstruction. Experiments on historical manuscripts and cross-domain remote sensing datasets demonstrate PRISMs ability to produce meaningful decompositions, with few components and improved interpretability compared to fixed-rank NMF and deep learning baselines.