Purpose: ChatGPT has gained popularity as a web application since its release in 2022. While artificial intelligence (AI) systems’ potential in scientific writing is widely discussed, their reliability in reviewing literature and providing accurate references remains unexplored. This study examines the reliability of references generated by ChatGPT language models in the Head and Neck field. Methods: Twenty clinical questions were generated across different Head and Neck disciplines, to prompt ChatGPT versions 3.5 and 4.0 to produce texts on the assigned topics. The generated references were categorized as “true,” “erroneous,” or “inexistent” based on congruence with existing records in scientific databases. Results: ChatGPT 4.0 outperformed version 3.5 in terms of reference reliability. However, both versions displayed a tendency to provide erroneous/non-existent references. Conclusions: It is crucial to address this challenge to maintain the reliability of scientific literature. Journals and institutions should establish strategies and good-practice principles in the evolving landscape of AI-assisted scientific writing.

Frosolini, A., Franz, L., Benedetti, S., Vaira, L.A., de Filippis, C., Gennaro, P., et al. (2023). Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 280(11), 5129-5133 [10.1007/s00405-023-08205-4].

Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines

Frosolini A.;Gennaro P.;Gabriele G.
2023-01-01

Abstract

Purpose: ChatGPT has gained popularity as a web application since its release in 2022. While artificial intelligence (AI) systems’ potential in scientific writing is widely discussed, their reliability in reviewing literature and providing accurate references remains unexplored. This study examines the reliability of references generated by ChatGPT language models in the Head and Neck field. Methods: Twenty clinical questions were generated across different Head and Neck disciplines, to prompt ChatGPT versions 3.5 and 4.0 to produce texts on the assigned topics. The generated references were categorized as “true,” “erroneous,” or “inexistent” based on congruence with existing records in scientific databases. Results: ChatGPT 4.0 outperformed version 3.5 in terms of reference reliability. However, both versions displayed a tendency to provide erroneous/non-existent references. Conclusions: It is crucial to address this challenge to maintain the reliability of scientific literature. Journals and institutions should establish strategies and good-practice principles in the evolving landscape of AI-assisted scientific writing.
2023
Frosolini, A., Franz, L., Benedetti, S., Vaira, L.A., de Filippis, C., Gennaro, P., et al. (2023). Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 280(11), 5129-5133 [10.1007/s00405-023-08205-4].
File in questo prodotto:
File Dimensione Formato  
Assessing the accuracy of ChatGPT-Frosolini-2023.pdf

non disponibili

Descrizione: Articolo
Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 877.17 kB
Formato Adobe PDF
877.17 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1263857