Supercomputing and Computation


RedEye: Text Analysis Workbench

Problem Statement:

  • As computer hard drives get larger, and data is created at increasing rates, the ability for investigators to effectively process the evidence from any single case is drastically impaired.

Technical Approach:

  • Data exploitation toolkit combining a number of technologies developed at ORNL as well as the open source community to quickly process millions of documents from multi-TB data sets. Going far beyond a traditional “keyword search”, RedEye discovers relationships between documents, allowing you to quickly process large collections without having to review every single file.
  • Includes: document similarity analysis, data reduction/recommendation systems, text extraction tools, email conversion utilities, automated entity extraction tools, timeline analysis, disk space/file type analysis, keyword search, and basic machine translation services.


  • Provides investigators an “informed starting point” by allowing them to provide sample documents and the toolkit will produce a subset from the evidence set that is mathematically similar to the samples.  From this point, users are able to quickly iterate over the entire data set to locate the information they need.
  • Application domains include law enforcement, digital forensics, intelligence, legal (defense/offense)

We're always happy to get feedback from our users. Please use the Comments form to send us your comments, questions, and observations.