Oral cancer is a major health problem requiring accurate healthcare support systems, and Deep learning (DL) based medical imaging has proven to be an effective solution. This work addresses the oral cancer classification task by employing different convolutional architectures. Our goal is to improve downstream classification tasks by incorporating segmentation information. In our experiments, we compare traditional classification training with two segment-driven strategies. The first approach involves training a dedicated neural network (NN) to predict masks, which are then used to classify masked images to hide unuseful information. In addition to the common hard-masking approach, we adopt an alternative relying on soft-masks to weigh the contribution of each pixel to the final classification. Then, we propose a second approach involving the training of a NN via CrossEntropyIoU, a loss function composed of the CrossEntropy for training a classifier, and the Intersection over Union measuring the mismatch between the activation map and the mask. Experiments show implementing segment-driven strategies enhances the accuracy and the training speed using both convolutional and transformer architectures. We evaluate each approach on a dataset acquired by the medical equipment of the team.
BibTex Code Here