DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA
In the DocVQA context, current end-to-end models either use lightweight [...]
In the DocVQA context, current end-to-end models either use lightweight [...]
Multispectral (MS) imaging reveals latent content in historical documents by [...]