Skip to main content
SHARE
Publication

ICAT: The Interactive Corpus Analysis Tool

by Nathan A Martindale, Scott L Stewart
Publication Type
Journal
Journal Name
The Journal of Open Source Software
Publication Date
Page Number
6873
Volume
10
Issue
110

The Interactive Corpus Analysis Tool (ICAT) is a Python library for creating dashboards to explore textual datasets and build simple binary classification models to help filter through them and focus on entries of interest. This tool uses a form of interactive machine learning (IML), a paradigm of “machine teaching” (Simard et al., 2017) that sits at the intersection of the fields of human computer interaction (HCI), visual analytics, and machine learning. The intent of ICAT is to allow subject matter experts (SME) with limited to no experience in machine learning to benefit from an iterative human-in-the-loop (HITL) approach to building their own model without needing to understand the details of the underlying algorithm. This interactivity is achieved by allowing the user to create features, label data points, and visually manipulate a representation of the features to manually cluster and investigate data, while a model is trained on the fly based on these actions. ICAT is built on top of the Panel (Holoviz, 2018) library, using a combination of Vega, a custom IPyWidget using D3, and ipyvuetify, and is intended to be used inside of a Jupyter environment.