Abstract
Neoadjuvant chemotherapy (NAC) is a standard treatment for locally advanced breast cancer, where achieving a pathological complete response (pCR) is the primary goal. Early identification of patient response is vital for personalizing treatment and avoiding unnecessary toxicity. Recent advances in deep learning have produced models predicting NST effectiveness using breast MRI scans. However, many of these models rely on time-consuming tumor segmentation, requiring expert involvement. To address these challenges and enhance clinical applicability, we propose a Multi-level Volumetric Transformer (MVT-Former), a novel model that predicts NAC response directly from non-segmented, full-field breast MRI data combined with relevant clinical information. The primary novelty of this work lies in its specialized dual-transformer design: (1) the Multi-level Convolutional Spatial Transformer (MLCS-Former), which utilizes multi-scale convolutions and a Global Convolutional Attention (GCA) mechanism to extract fine-grained textural and morphological features from 2D MRI slices without manual annotations; and (2) the Volume Feature Learning Transformer (VFL-Former), which captures 3D structural changes and long-range dependencies across the entire MRI volume. We evaluated the MVT-Former on the I-SPY-1 TRIAL dataset and results demonstrate that the proposed model outperforms state-of-the-art methods, achieving superior performance across key metrics, including area under the curve, accuracy, sensitivity, and specificity. treatment planning.