Abstract
Cancer is the leading cause of death by disease in American children. Each year, nearly 16,000 children in the United States and over 300,000 children globally are diagnosed with cancer. Leukemia is a form of blood cancer that originates in the bone marrow and accounts for one-third of pediatric cancers. This disease occurs when the bone marrow contains 20% or more immature white blood cell blasts. Acute lymphoblastic leukemia is the most prevalent leukemia type found in children, with half of all annual cases in the U.S. diagnosed for subjects under 20 years of age. To diagnose acute lymphoblastic leukemia, pathologists often conduct a morphological bone marrow assessment. This assessment determines whether the immature white blood cell blasts in bone marrow display the correct morphological characteristics, such as size and appearance of nuclei. Pathologists also use immunophenotyping via multi-channel flow cytometry to test whether certain antigens are present on the surface of blast cells; the antigens are used to identify the cell lineage of acute lymphoblastic leukemia. These manual processes require well-trained personnel and medical professionals, thus being costly in time and expenses. Computerized decision support via machine learning can accelerate the diagnosis process and reduce the cost. Training a reliable classification model to distinguish between mature and immature white blood cells is essential to the decision support system. Here, we adopted the Vision Transformer model to classify white blood cells. The Vision Transformer achieved superb classification performance compared to state-of-the-art convolutional neural networks while requiring less computational resources for training. Additionally, the latent self-attention architecture provided attention maps for a given image, providing clues as to which portion(s) of the image were significant in decision-making. We applied the Vision Transformer model and a convolutional neural network model to an acute lymphoblastic leukemia classification dataset of 12,528 samples and achieved accuracies of 88.4% and 86.2%.