Abstract: Vision transformer (ViT) models have recently emerged as powerful and versatile tools for various visual tasks. In this article, we investigate ViT in a more challenging scenario within the ...
As a multimodal task, remote sensing visual question answering (RSVQA) has become a research hotspot. However, existing methods are restricted in practical appli ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results