Skip to main content
SHARE
Publication

A Compound Data Poisoning Technique with Significant Adversarial Effects on Transformer-based Sentiment Classification Tasks

by , , , Edmon Begoli
Publication Type
Journal
Journal Name
Journal of Data Quality and Information
Publication Date
Volume
N/a

Transformer-based models have demonstrated much success in various natural language processing (NLP) tasks. However, they are often vulnerable to adversarial attacks, such as data poisoning, that can intentionally fool the model into generating incorrect results. In this paper, we present a novel, compound variant of a data poisoning attack on a transformer-based model that maximizes the poisoning effect while minimizing the scope of poisoning. We do so by combining the established data poisoning technique (label flipping) with a novel adversarial artifact selection and insertion technique aimed at minimizing detectability and the scope of the poisoning footprint. We find that using a combination of these two techniques, we achieve a state-of-the-art attack success rate (ASR) of ~90% while poisoning only 0.5% of the original training set, thus minimizing the scope and detectability of the poisoning action. These findings have the potential to advance the development of better data poisoning detection methods.