Abstract
Buildings are a core component of the urban environment and affect human populations, energy usage, city development, city planning, and urban heat islands. Buildings span an enormous range of sizes, from a 2m tall shelter to the Burj Khalifa; and at the same time there are widely recognized categories of similar buildings, with homes, office buildings, or skyscrapers as some examples. Currently, there is no consistent method to quantitatively determine how a building should be categorized by its height, or how many categories there should be within the built environment. Additionally, these categories vary spatially, leading to multiple definitions at local scales of what it means to be a tall, medium, or short building. Here, we find across 17.59 million buildings in the United States, Germany, and Japan, that applying a K-nearest neighbor approach to quantitatively bin the built environment outperforms the current state-of-the-art, subjective domain knowledge. This was evidenced as our method of leveraging a K-nearest neighbor improved upon the existing approach of using domain knowledge by 10% with respect to precision, recall, F1-score and accuracy. Our results showcase the finding that it is possible to generate a global and consistent approach to categorizing the built environment in relation to height. This is significant in that there is now a quantitative way to categorize the built environment based on building height at a global scale, allowing researchers a consistent platform for comparison and collaboration across various applications.