Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MF-RSVLM is a remote sensing vision-language model (VLM) combining a CLIP vision encoder and a Vicuna-7B language model. The model was trained in two stages for modality alignment and instruction following. The dataset is associated with the FUSE-RSVLM project and was uploaded by RL-MIND.
License is unknown; terms of use must be verified before application.