Skip to main content
SHARE
Publication

How Good is Good Enough?: Quantifying the Effects of Training Set Quality

by Benjamin T Swan, Melanie L Laverdiere, Hsiuhan Yang
Publication Type
Conference Paper
Book Title
GeoAI '18: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
Publication Date
Page Numbers
47 to 51
Issue
1
Publisher Location
New York, New York, United States of America
Conference Name
26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Conference Location
Seattle, Washington, United States of America
Conference Sponsor
Association for Computing Machinery
Conference Date
-

There is a general consensus in the neural network community that noise in training data has a negative impact on model output; however, efforts to quantify the impact of varying levels have been limited, particularly for semantic segmentation tasks. This is a question of particular importance for remote sensing applications where the cost of producing a large training set can lead to reliance on publicly available data with varying degrees of noise. This work explores the effects of different degrees and types of training label noise on a pre-trained building extraction deep learner. Quantitative and qualitative evaluations of these effects can help inform decisions about trade-offs between the cost of producing training data and the quality of model outputs. We found that, relative to the base model, models trained with small amounts of noise showed little change in precision but achieved considerable increases in recall. Conversely, as noise levels increased, both precision and recall decreased. Precision and recall both lagged behind a model trained with pristine data. These exploratory results indicate the importance of quality control for training and, more broadly, that the relationship between degrees and types of training data noise and model performance is more complex than trade-offs between precision and recall.