Logo image
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
Conference proceeding

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery

Jiajun Cheng, Xianwu Zhao, Sainan Liu, Xiaofan Yu, Ravi Prakash, Patrick J. Codd, Jonathan Elliott Katz and Shan Lin
Proceedings / IEEE Workshop on Applications of Computer Vision, pp.8188-8198
2026-03-06

Abstract

benchmarks Circuits explainable ai Feedback Integrated circuits Location awareness Low earth orbit satellites Mobile communication Pixel Product development surgical instrument and action classification Video equipment Videos vlm
Innovations in digital intelligence are transforming robotic surgery through more informed decision-making. Real-time awareness of surgical instrument presence and actions (e.g., cutting tissue) is essential, yet despite decades of research, most machine learning models rely on small datasets and still struggle to generalize. Recently, Vision-Language Models (VLMs) have achieved transformative advances in multimodal reasoning, suggesting strong potential for intelligent robotic surgery. However, surgical VLMs remain underexplored, and existing models show limited performance, underscoring the need for systematic benchmarks to assess their capabilities, limitations, and future development. To this end, we benchmark the zero-shot performance of several advanced VLMs on two public robotic-assisted laparoscopic datasets for instrument and action classification. Beyond standard evaluation, we integrate explainable AI to visualize VLM attention and uncover causal explanations behind predictions, providing a previously underexplored perspective for assessing model reliability. We also propose explainability-based metrics to complement standard evaluations. Our analysis reveals that surgical VLMs, despite domain-specific training, often rely on weak contextual cues rather than clinically meaningful visual evidence, highlighting the need for stronger visual and reasoning supervision in surgical applications. The code is provided in our public repository at: https://github.com/jiajun344/SurgXBench-Explainable-Vision-Language-Model-Benchmark-for-Surgery.

Metrics

1 Record Views

Details

Logo image