Remote sensing in combination with deep learning has become instrumental for efficiently and accurately classifying land-use and land-cover across large geographic areas. These technologies have also been successful in characterizing urban environments in terms of their structural units, structure types, or morphological regions. In these approaches, an urban area is partitioned into regions that exhibit homogeneous physical characteristics. However, existing approaches are typically limited to a single city, use inconsistent typologies, and lack scalability and generalization capacity. In this article, we propose an urban structural units categorization scheme and demonstrate its utility by applying it to 13 cities. Inspired by the lack of scalability and generalization capacity in urban structural units mapping, we extend the reach of deep learning and conduct a set of classification experiments in all 13 cities. These experiments offer insights into the strengths and limitations of deep neural networks for classifying urban structural units over diverse geographic regions and on heterogeneous collections of satellite imagery. The efficacy of the proposed deep learning approach is compared to a baseline method of multiscale image features and support vector machines. Our validation on five cities shows that better performance is achieved with deep neural networks. Additionally, we evaluate the impact of input size, model depth, and spatial pyramid pooling to assess the generalization capacity of deep neural networks.