Abstract
Background: Late Gadolinium Enhancement (LGE) imaging is the gold standard
for assessing myocardial fibrosis and scarring, with left ventricular (LV) LGE
extent predicting major adverse cardiac events (MACE). Despite its importance,
routine LGE-based LV scar quantification is hindered by labor-intensive manual
segmentation and inter-observer variability. Methods: We propose ScarNet, a
hybrid model combining a transformer-based encoder from the Medical Segment
Anything Model (MedSAM) with a convolution-based U-Net decoder, enhanced by
tailored attention blocks. ScarNet was trained on 552 ischemic cardiomyopathy
patients with expert segmentations of myocardial and scar boundaries and tested
on 184 separate patients. Results: ScarNet achieved robust scar segmentation in
184 test patients, yielding a median Dice score of 0.912 (IQR: 0.863--0.944),
significantly outperforming MedSAM (median Dice = 0.046, IQR: 0.043--0.047) and
nnU-Net (median Dice = 0.638, IQR: 0.604--0.661). ScarNet demonstrated lower
bias (-0.63%) and coefficient of variation (4.3%) compared to MedSAM (bias:
-13.31%, CoV: 130.3%) and nnU-Net (bias: -2.46%, CoV: 20.3%). In Monte Carlo
simulations with noise perturbations, ScarNet achieved significantly higher
scar Dice (0.892 \pm 0.053, CoV = 5.9%) than MedSAM (0.048 \pm 0.112, CoV =
233.3%) and nnU-Net (0.615 \pm 0.537, CoV = 28.7%). Conclusion: ScarNet
outperformed MedSAM and nnU-Net in accurately segmenting myocardial and scar
boundaries in LGE images. The model exhibited robust performance across diverse
image qualities and scar patterns.