Multivariate Testing of Sampling Techniques to Address Class Imbalance in Building Use Type Classification...

by Daniel S Adams, Taylor R Hauser, Hsiuhan Yang, Peter Li

Publication Type

Conference Paper

Book Title

GeoAI '24: Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

Publication Date

November, 2024

Page Numbers

15 to 26

Publisher Location

New York, New York, United States of America

Conference Name

SIGSPATIAL '24: The 32nd ACM International Conference on Advances in Geographic Information Systems

Conference Location

Atlanta, Georgia, United States of America

Conference Sponsor

ACM

Conference Date

Oct 29, 2024 - Nov 1, 2024

View DOI Listing

Abstract

This study addresses the challenges inherent in building use type classification, particularly focusing on the issue of class imbalance in the training datasets for machine learning classifiers. We comprehensively analyze the efficacy of various class-balancing sampling techniques. Employing Monte Carlo simulations and Bayesian optimization, we evaluated the performance of multiple sampling methods, including Random Oversampling, Random Undersampling, SMOTE, Borderline-SMOTE, and ADASYN, across a dataset encompassing nine southeastern coastal states of the United States. Our findings reveal that simple random over- and undersampling techniques outperform more sophisticated methods. Additionally, we show inherent value in creating an imbalance in training data to effectively train a machine learning classifier for distinguishing between residential and nonresidential buildings. This study provides valuable guidance for future research on building use type classification research and lays essential groundwork for developing attribute-rich building stock datasets.

Multivariate Testing of Sampling Techniques to Address Class Imbalance in Building Use Type Classification...

Abstract

Researchers

Organizations