Abstract
Skin cancer has been a significant global health threat, with its occurrence increasing in recent years. Traditional diagnostic methods, such as dermoscopy and biopsy have been effective, but have limitations concerning subjectivity, invasiveness, and accessibility. In order to address these challenges, this study explores the application of deep learning models for skin disease classification using the PAD-UFES-20 dataset, which contains smartphone-captured images and clinical patient information, to provide a scalable, non-invasive screening tool that can complement conventional methods. The proposed approach implements intermediate fusion to combine the dataset's modalities using vision transformers (ViT or Dino V2) to extract image features, and employs classical machine learning (XGBoost) to classify the fused data. Data preprocessing and augmentation techniques were applied to handle class imbalance and low-quality images. Results show that the proposed approach achieves high accuracy, outperforming benchmarks from existing studies. Grad-CAM visualizations confirmed that the models primarily focus on relevant skin lesion features, but also identified areas of potential improvement, such as reducing sensitivity to irrelevant regions. This research demonstrates the promise of deep learning models in dermatology for remote skin disease screening, potentially improving access to dermatological care in underserved populations and healthcare deserts. Future work will focus on refining the proposed approach to further increase its performance and generalization capabilities, as well as to refine the explanation provided with the model's classification.