Abstract
An enormous amount of information available via the Internet
exists. Much of this data is in the form of text-based documents.
These documents cover a variety of topics that are vitally
important to the scientific, business, and defense/security
communities. Currently, there are a many techniques for
processing and analyzing such data. However, the ability to
quickly characterize a large set of documents still proves
challenging. Previous work has successfully demonstrated the
use of a genetic algorithm for providing a representative subset
for text documents via adaptive sampling. In this work, we
further expand and explore this approach on much larger data sets
using a parallel Genetic Algorithm (GA) with adaptive parameter
control. Experimental results are presented and discussed.