Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm...

by Robert M Patton

Publication Type

Conference Paper

Book Title

Genetic and Evolutionary Computation Conference

Publication Date

July, 2006

Page Numbers

1877 to 1878

Conference Name

Genetic and Evolutionary Computation Conference

Conference Location

Seattle, Washington, United States of America

Conference Date

Jul 8, 2006 - Jul 12, 2006

Abstract

An enormous amount of information available via the Internet
exists. Much of this data is in the form of text-based documents.
These documents cover a variety of topics that are vitally
important to the scientific, business, and defense/security
communities. Currently, there are a many techniques for
processing and analyzing such data. However, the ability to
quickly characterize a large set of documents still proves
challenging. Previous work has successfully demonstrated the
use of a genetic algorithm for providing a representative subset
for text documents via adaptive sampling. In this work, we
further expand and explore this approach on much larger data sets
using a parallel Genetic Algorithm (GA) with adaptive parameter
control. Experimental results are presented and discussed.

Characterizing Large Text Corpora Using a Maximum Variation Sampling Genetic Algorithm...

Abstract

Researchers

Organizations