Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts

IRIS

Background and Objectives: this pilot study aimed to evaluate the diagnostic accuracy of ChatGPT-4o in analyzing oral mucosal lesions from clinical images. Materials and Methods: a total of 110 clinical images, including 100 pathological lesions and 10 healthy mucosal images, were retrieved from Google Images and analyzed by ChatGPT-4o using a standardized prompt. An expert panel of five clinicians established a reference diagnosis, categorizing lesions as benign or malignant. The AI-generated diagnoses were classified as correct or incorrect and further categorized as plausible or not plausible. The accuracy, sensitivity, specificity, and agreement with the expert panel were analyzed. The Artificial Intelligence Performance Instrument (AIPI) was used to assess the quality of AI-generated recommendations. Results: ChatGPT-4o correctly diagnosed 85% of cases. Among the 15 incorrect diagnoses, 10 were deemed plausible by the expert panel. The AI misclassified three malignant lesions as benign but did not categorize any benign lesions as malignant. Sensitivity and specificity were 91.7% and 100%, respectively. The AIPI score averaged 17.6 +/- 1.73, indicating strong diagnostic reasoning. The McNemar test showed no significant differences between AI and expert diagnoses (p = 0.084). Conclusions: In this proof-of-concept pilot study, ChatGPT-4o demonstrated high diagnostic accuracy and strong descriptive capabilities in oral mucosal lesion analysis. A residual 8.3% false-negative rate for malignant lesions underscores the need for specialist oversight; however, the model shows promise as an AI-powered triage aid in settings with limited access to specialized care.

Vaira, L.A., Lechien, J.R., Maniaci, A., De Vito, A., Mayo-Yáñez, M., Troise, S., et al. (2025). Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts. MEDICINA, 61(8) [10.3390/medicina61081379].

Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts

Vaira L. A.;Lechien J. R.;Maniaci A.;De Vito A.;Mayo-Yáñez M.;Troise S.;Consorti G.;Chiesa-Estomba C. M.;Cammaroto G.;Radulesco T.;di Stadio A.;Tel A.;Frosolini A.;Gabriele G.;Iannella G.;Saibene A. M.;Boscolo-Rizzo P.;Soro G. M.;Salzano G.;De Riu G.

2025-01-01

Abstract

Background and Objectives: this pilot study aimed to evaluate the diagnostic accuracy of ChatGPT-4o in analyzing oral mucosal lesions from clinical images. Materials and Methods: a total of 110 clinical images, including 100 pathological lesions and 10 healthy mucosal images, were retrieved from Google Images and analyzed by ChatGPT-4o using a standardized prompt. An expert panel of five clinicians established a reference diagnosis, categorizing lesions as benign or malignant. The AI-generated diagnoses were classified as correct or incorrect and further categorized as plausible or not plausible. The accuracy, sensitivity, specificity, and agreement with the expert panel were analyzed. The Artificial Intelligence Performance Instrument (AIPI) was used to assess the quality of AI-generated recommendations. Results: ChatGPT-4o correctly diagnosed 85% of cases. Among the 15 incorrect diagnoses, 10 were deemed plausible by the expert panel. The AI misclassified three malignant lesions as benign but did not categorize any benign lesions as malignant. Sensitivity and specificity were 91.7% and 100%, respectively. The AIPI score averaged 17.6 +/- 1.73, indicating strong diagnostic reasoning. The McNemar test showed no significant differences between AI and expert diagnoses (p = 0.084). Conclusions: In this proof-of-concept pilot study, ChatGPT-4o demonstrated high diagnostic accuracy and strong descriptive capabilities in oral mucosal lesion analysis. A residual 8.3% false-negative rate for malignant lesions underscores the need for specialist oversight; however, the model shows promise as an AI-powered triage aid in settings with limited access to specialized care.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista su cui è pubblicata l'opera
	
				MEDICINA
			
	Citazione
	
				Vaira, L.A., Lechien, J.R., Maniaci, A., De Vito, A., Mayo-Yáñez, M., Troise, S., et al. (2025). Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts. MEDICINA, 61(8) [10.3390/medicina61081379].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal-Vaira-2025.pdf accesso aperto Descrizione: Articoli Tipologia: PDF editoriale Licenza: Creative commons Dimensione 311.35 kB Formato Adobe PDF Visualizza/Apri	311.35 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1303174