Abstract
Quantum Natural Language Processing (QNLP) develops natural language processing (NLP) models for deployment on quantum computers. We explore feature and data prototype selection techniques to address challenges posed by encoding high dimensional features. Our study builds quantum circuit classifiers that includes classical feature pre-processing, quantum embedding and quantum model training. The quantum models are built on 4 or 6 qubits and the quantum neural network (QNN) uses the established bricklayer design. We compare the dependence of model performance (in terms of accuracy and F1 scores) on feature length, embedding gates and parameterized unitary design. We compare the performance of quantum machine learning models to classical convolution neural network model (CNN) on binary and multi-class classification tasks using two datasets of synthetic features and labels. The first is the ECP-CANDLE P3B3 dataset a corpus of synthetically generated cancer pathology reports. The second dataset is extracted from well-known benchmark dataset (MADELON) - features are generated with a combination of informative, repeated and uninformative features. Both datasets are used for binary classification and multi-class classification with 3 classes. We observe robust, accurate performance from all models on the binary classification tasks, but multiclass classification is a challenge for the quantum models-there is a notable decrease in accuracy when using 3 classes. Overall the performance is comparable in terms of recall and accuracy between QNNs and CNNs, even with large datasets. These results provide a point of comparison between quantum and classical models on real-world datasets.