Aim: In the digital age, artificial intelligence (AI) platforms have gradually replaced traditional manual techniques for information retrieval. However, their effectiveness in conducting academic literature searches remains unclear, necessitating a comparative assessment. This study examined the efficacy of AI search engines (Elicit, Consensus, ChatGPT) vs. manual search for literature retrieval, focusing on the surgical management of trapeziometacarpal osteoarthritis. Methods: The study was executed per the Cochrane Handbook for Systematic Reviews and PRISMA guidelines. AI platforms were given relevant keywords and prompts, while manual searches used PubMed, Cochrane CENTRAL, Web of Science, and Scopus databases from January 1901 to April 2024. The study focused on English-language randomized controlled trials (RCTs) comparing surgical management of trapeziometacarpal osteoarthritis (TMCJ OA). Two independent evaluators screened and extracted data from the studies. Primary outcomes involved the quality and relevancy of studies chosen by both search methods, evaluated by false positive rates and number of studies, including outcomes of interest.Results: The manual search yielded the most results (6,018), followed by Elicit (4,980), Consensus (3,436), and ChatGPT (6). Elicit identified the highest number of RCTs (205) but also had the greatest false positive rate (94%). Ultimately, the manual search identified 23 suitable studies, Elicit found 10, Consensus found 9, and ChatGPT identified only 1. No additional studies were found by AI search engines that were not discovered in the manual search. Conclusion: The findings highlight the potential advantages and drawbacks of AI search engines for literature searches. While Elicit was prone to error, Consensus and ChatGPT were less comprehensive. Significant enhancements in the precision and thoroughness of AI search engines are required before they can be effectively utilized in academia.

Seth, I., Lim, B., Xie, Y., Ross, R.J., Cuomo, R., Rozen, W.M. (2025). Artificial intelligence versus human researcher performance for systematic literature searches: a study focusing on the surgical management of base of thumb arthritis. PLASTIC AND AESTHETIC RESEARCH, 12 [10.20517/2347-9264.2024.99].

Artificial intelligence versus human researcher performance for systematic literature searches: a study focusing on the surgical management of base of thumb arthritis

Cuomo R.;
2025-01-01

Abstract

Aim: In the digital age, artificial intelligence (AI) platforms have gradually replaced traditional manual techniques for information retrieval. However, their effectiveness in conducting academic literature searches remains unclear, necessitating a comparative assessment. This study examined the efficacy of AI search engines (Elicit, Consensus, ChatGPT) vs. manual search for literature retrieval, focusing on the surgical management of trapeziometacarpal osteoarthritis. Methods: The study was executed per the Cochrane Handbook for Systematic Reviews and PRISMA guidelines. AI platforms were given relevant keywords and prompts, while manual searches used PubMed, Cochrane CENTRAL, Web of Science, and Scopus databases from January 1901 to April 2024. The study focused on English-language randomized controlled trials (RCTs) comparing surgical management of trapeziometacarpal osteoarthritis (TMCJ OA). Two independent evaluators screened and extracted data from the studies. Primary outcomes involved the quality and relevancy of studies chosen by both search methods, evaluated by false positive rates and number of studies, including outcomes of interest.Results: The manual search yielded the most results (6,018), followed by Elicit (4,980), Consensus (3,436), and ChatGPT (6). Elicit identified the highest number of RCTs (205) but also had the greatest false positive rate (94%). Ultimately, the manual search identified 23 suitable studies, Elicit found 10, Consensus found 9, and ChatGPT identified only 1. No additional studies were found by AI search engines that were not discovered in the manual search. Conclusion: The findings highlight the potential advantages and drawbacks of AI search engines for literature searches. While Elicit was prone to error, Consensus and ChatGPT were less comprehensive. Significant enhancements in the precision and thoroughness of AI search engines are required before they can be effectively utilized in academia.
2025
Seth, I., Lim, B., Xie, Y., Ross, R.J., Cuomo, R., Rozen, W.M. (2025). Artificial intelligence versus human researcher performance for systematic literature searches: a study focusing on the surgical management of base of thumb arthritis. PLASTIC AND AESTHETIC RESEARCH, 12 [10.20517/2347-9264.2024.99].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1294376
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo