Skip to main content
SHARE
Publication

Multivariate Testing of Sampling Techniques to Address Class Imbalance in Building Use Type Classification...

by Daniel S Adams, Taylor R Hauser, Hsiuhan Yang, Peter Li
Publication Type
Conference Paper
Book Title
GeoAI '24: Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
Publication Date
Page Numbers
15 to 26
Publisher Location
New York, New York, United States of America
Conference Name
SIGSPATIAL '24: The 32nd ACM International Conference on Advances in Geographic Information Systems
Conference Location
Atlanta, Georgia, United States of America
Conference Sponsor
ACM
Conference Date
-

This study addresses the challenges inherent in building use type classification, particularly focusing on the issue of class imbalance in the training datasets for machine learning classifiers. We comprehensively analyze the efficacy of various class-balancing sampling techniques. Employing Monte Carlo simulations and Bayesian optimization, we evaluated the performance of multiple sampling methods, including Random Oversampling, Random Undersampling, SMOTE, Borderline-SMOTE, and ADASYN, across a dataset encompassing nine southeastern coastal states of the United States. Our findings reveal that simple random over- and undersampling techniques outperform more sophisticated methods. Additionally, we show inherent value in creating an imbalance in training data to effectively train a machine learning classifier for distinguishing between residential and nonresidential buildings. This study provides valuable guidance for future research on building use type classification research and lays essential groundwork for developing attribute-rich building stock datasets.