Studies on domain adaptation (DA) for remote sensing (RS) imagery analysis lack consistency in selection and description of evaluation scenarios. Without properly characterizing datasets, model assumptions, and evaluation scenarios, it is difficult to objectively compare DA methods and reach conclusions about their suitability across different applications. With this motivation, this work seeks to empirically assess to which extent the interaction between data characteristics and model assumptions influences the effectiveness of DA methods. Using the widely explored task of building footprint segmentation as a case study, we perform a large-scale study across over 200 DA scenarios that include variations across view angles, areas observed, and sensors used for data acquisition. Rather than adopting different model architectures or optimization criteria, we contrast the performances of two DA methods based on adversarial learning that differ only in their assumptions about source and target domains. Informed by metadata and data characteristics unveiled using traditional computer vision (CV) techniques as well as pretrained deep models, we provide a detailed meta-analysis of experiments highlighting the importance of accurately considering data assumptions for DA in RS segmentation tasks. As demonstrated by a “cherry-picking” exercise, different claims regarding which model is best could be made by selecting different subsets of evaluation scenarios. While well-calibrated assumptions can be beneficial, mismatching assumptions can lead to negative biases in DA applications. This study intends to motivate the community toward more consistent evaluation protocols while providing recommendations and insights toward creating novel benchmark datasets, documenting data characteristics, application-specific knowledge, and model assumptions.