Towards Diverse and Representative Global Pretraining Datasets for Remote Sensing Foundation Models...

by Jacob W Arndt, Philipe Ambrozio Dias, Abhishek V Potnis, Wadzanai D Lunga

Publication Type

Conference Paper

Book Title

IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium

Publication Date

July, 2024

Page Numbers

2723 to 2728

Publisher Location

New Jersey, United States of America

Conference Name

2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

Conference Location

Athens, Greece

Conference Sponsor

IEEE

Conference Date

Jul 7, 2024 - Jul 12, 2024

View DOI Listing

Abstract

The design of a pretraining dataset is emerging as a critical component for the generality of foundation models. In the remote sensing realm, large volumes of imagery and benchmark datasets exist that can be leveraged to pretrain foundation models, however using this imagery in absence of a well-crafted sampling strategy is inefficient and has the potential to create biased and less generalizable models. Here, we provide a discussion and vision for the curation and assessment of pretraining datasets for remote sensing geospatial foundation models. We highlight the importance of geographic, temporal, and image acquisition diversity and review possible strategies to enable such diversity at global scale. In addition to these characteristics, support for various spatial-temporal pretext tasks within the dataset is also critical. Ultimately, our primary objective is to place emphasis on and draw attention to the data curation stage of the foundation model development pipeline. By doing so, we think it is possible to reduce biases of geospatial foundation models, as well as enable broader generalization to downstream remote sensing tasks and applications.

Towards Diverse and Representative Global Pretraining Datasets for Remote Sensing Foundation Models...

Abstract

Researchers

Organizations