Open access
Open access
Powered by Google Translator Translator

VisionFM: A Generalist AI Surpasses Single-Modality Models in Ophthalmic Diagnostics

25 Dec, 2024 | 13:41h | UTC

Background: Ophthalmic AI models typically address single diseases or modalities. Their limited generalizability restricts broad clinical application. This study introduces VisionFM, a novel foundation model trained on 3.4 million images from over 500,000 individuals. It covers eight distinct ophthalmic imaging modalities (e.g., fundus photography, OCT, slit-lamp, ultrasound, MRI) and encompasses multiple diseases. Compared with prior single-task or single-modality approaches, VisionFM’s architecture and large-scale pretraining enable diverse tasks such as disease screening, lesion segmentation, prognosis, and prediction of systemic markers.

Objective: To develop and validate a generalist ophthalmic AI framework that can handle multiple imaging modalities, recognize multiple diseases, and adapt to new clinical tasks through efficient fine-tuning, potentially easing the global burden of vision impairment.

Methods: VisionFM employs individual Vision Transformer–based encoders for each of the eight imaging modalities, pretrained with self-supervised learning (iBOT) focused on masked image modeling. After pretraining, various task-specific decoders were fine-tuned for classification, segmentation, and prediction tasks. The model was evaluated on 53 public and 12 private datasets, covering eight disease categories (e.g., diabetic retinopathy, glaucoma, cataract), five imaging modalities (fundus photographs, OCT, etc.), plus additional tasks (e.g., MRI-based orbital tumor segmentation). Performance metrics included AUROCs, Dice similarity coefficients, F1 scores, and comparisons with ophthalmologists of varying clinical experience.

Results: VisionFM achieved an average AUROC of 0.950 (95% CI, 0.941–0.959) across eight disease categories in internal validation. External validation showed AUROCs of 0.945 (95% CI, 0.934–0.956) for diabetic retinopathy and 0.974 (95% CI, 0.966–0.983) for AMD, surpassing baseline deep learning approaches. In a 12-disease classification test involving 38 ophthalmologists, VisionFM’s accuracy matched intermediate-level specialists. It successfully handled modality shifts (e.g., grading diabetic retinopathy on previously unseen OCTA), with an AUROC of 0.935 (95% CI, 0.902–0.964). VisionFM also predicted glaucoma progression (F1, 72.3%; 95% CI, 55.0–86.3) and flagged possible intracranial tumors (AUROC, 0.986; 95% CI, 0.960–1.00) from fundus images.

Conclusions: VisionFM offers a versatile, scalable platform for comprehensive ophthalmic tasks. Through self-supervised learning and efficient fine-tuning, it extends specialist-level performance to multiple clinical scenarios and imaging modalities. The study demonstrates that large-scale, multimodal pretraining can enable robust generalization to unseen data, potentially reducing data annotation burdens and accelerating AI adoption worldwide.

Implications for Practice: VisionFM may help address global shortages of qualified ophthalmologists and expand care in low-resource settings, though clinical decision-making still requires appropriate human oversight. Further multicenter studies are needed before widespread implementation, especially for higher-risk use cases such as tumor detection.

Study Strengths and Limitations: Strengths include its unique multimodal design, large-scale pretraining, and extensive external validation. Limitations involve demographic bias toward Chinese datasets, the need for larger cohorts in certain applications (e.g., intracranial tumor detection), and the challenges of matching real-world clinical complexity when only image-based data are used.

Future Research: Further validation in diverse populations, integration of new imaging modalities (e.g., widefield imaging, ultrasound variants), and expansion to additional diseases are planned. Hybridization with large language models could facilitate automatic generation of clinical reports.

Reference: Qiu J, Wu J, Wei H, et al. Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence. NEJM AI 2024;1(12). DOI: http://doi.org/10.1056/AIoa2300221

 


Stay Updated in Your Specialty

Telegram Channels
Free

WhatsApp alerts 10-day free trial

No spam, just news.