As a result of increasing geospatial image libraries, many algorithms are being developed to automatically extract and classify regions of interest from these images. However, limited work has been done to compare, validate and verify these algorithms due to the lack of datasets with high accuracy ground truth annotations. In this paper, we present an approach to generate a large number of synthetic images accompanied by perfect ground truth annotation via learning scene statistics from few training images through Maximum Entropy (ME) modeling. The ME model [1,2] embeds a Stochastic Context Free Grammar (SCFG) to model object attribute variations with Markov Random Fields (MRF) with the final goal of modeling contextual relations between objects. Using this model, 3D scenes are generated by configuring a 3D object model to obey the learned scene statistics. Finally, these plausible 3D scenes are captured by ray tracing software to produce synthetic images with the corresponding ground truth annotations that are useful for evaluating the performance of a variety of image analysis algorithms.