Skip to main content
SHARE
Publication

Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm...

by Robert M Patton
Publication Type
Conference Paper
Book Title
Genetic and Evolutionary Computation Conference
Publication Date
Page Numbers
1877 to 1878
Conference Name
Genetic and Evolutionary Computation Conference
Conference Location
Seattle, Washington, United States of America
Conference Date
-

An enormous amount of information available via the Internet
exists. Much of this data is in the form of text-based documents.
These documents cover a variety of topics that are vitally
important to the scientific, business, and defense/security
communities. Currently, there are a many techniques for
processing and analyzing such data. However, the ability to
quickly characterize a large set of documents still proves
challenging. Previous work has successfully demonstrated the
use of a genetic algorithm for providing a representative subset
for text documents via adaptive sampling. In this work, we
further expand and explore this approach on much larger data sets
using a parallel Genetic Algorithm (GA) with adaptive parameter
control. Experimental results are presented and discussed.