Establishing up-to-date nationwide building maps is essential to understand urban dynamics, such as estimating population and urban planning and many other applications. However, an efficient and effective solution is yet to be developed. In this paper, for the first time we evaluate three state-of-the-art CNNs for detecting buildings across entire United States using aerial images. The three CNN architectures, fully convolutional neural network, conditional random field as recurrent neural network, and SegNet, support semantic pixel-wise labeling and focus on capturing textural information at multi-scale. We use 1-meter resolution NAIP images as the test data set, and compare the detection results across the three methods. In addition, we propose to combine signed distance function labels with SegNet, which is the preferred CNN architecture identified by our extensive evaluations. The results are further improved in terms of precision, recall rate and the number of building detected. On average, model inference on test images is less than one minute for an area of size $\sim56$ $km^2$. With these promising results and the time required to process images, the framework offers great potential toward country scale building mapping with remote sensing imagery.