Named Entity Recognition (NER) is a critical task in natural language processing with significant implications for various downstream applications. This paper presents a comprehensive comparative study of NER performance across multiple domains, focusing on the business-oriented BUSTER dataset and the widely used CoNLL 2003 English dataset. We evaluate a diverse set of models, including fine-tuned Question Answering (QA) models such as BERT and RoBERTa, as well as state-of-the-art generative Large Language Models such as GPT series 3 and 4 [1], LLaMA 2-3 [2], [3] and Mistral [4].Our study investigates the adaptation of QA models for NER and examines the zero-shot capabilities of generative models, assessing their intrinsic ability to identify named entities without task-specific fine-tuning. Through extensive experimentation, we analyze precision, recall, and F1 scores across different entity categories, comparing performance across datasets and model families. Additionally, for QA models we explore the robustness of these models under different training setups and evaluation metrics, shedding light on their adaptability to structured and unstructured text data. Our findings provide insights into the effectiveness of both fine-tuned and zero-shot approaches, with models achieving state-of-the-art performance through fine-tuning. This contributes to a broader understanding of the NER task, particularly in domain-specific contexts.
Stefanelli, M., Maggini, M., Rigutini, L. (2025). Comparative analysis of token classification and zero-shot LLM approaches for Named Entity Recognition in the business domain. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2025). New York : IEEE [10.1109/IJCNN64981.2025.11229125].
Comparative analysis of token classification and zero-shot LLM approaches for Named Entity Recognition in the business domain
Stefanelli, Marco
;Maggini, Marco;
2025-01-01
Abstract
Named Entity Recognition (NER) is a critical task in natural language processing with significant implications for various downstream applications. This paper presents a comprehensive comparative study of NER performance across multiple domains, focusing on the business-oriented BUSTER dataset and the widely used CoNLL 2003 English dataset. We evaluate a diverse set of models, including fine-tuned Question Answering (QA) models such as BERT and RoBERTa, as well as state-of-the-art generative Large Language Models such as GPT series 3 and 4 [1], LLaMA 2-3 [2], [3] and Mistral [4].Our study investigates the adaptation of QA models for NER and examines the zero-shot capabilities of generative models, assessing their intrinsic ability to identify named entities without task-specific fine-tuning. Through extensive experimentation, we analyze precision, recall, and F1 scores across different entity categories, comparing performance across datasets and model families. Additionally, for QA models we explore the robustness of these models under different training setups and evaluation metrics, shedding light on their adaptability to structured and unstructured text data. Our findings provide insights into the effectiveness of both fine-tuned and zero-shot approaches, with models achieving state-of-the-art performance through fine-tuning. This contributes to a broader understanding of the NER task, particularly in domain-specific contexts.| File | Dimensione | Formato | |
|---|---|---|---|
|
Comparative_Analysis_of_Token_Classification_and_Zero-Shot_LLM_Approaches_for_Named_Entity_Recognition_in_the_Business_Domain.pdf
non disponiibile
Tipologia:
PDF editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
896.01 kB
Formato
Adobe PDF
|
896.01 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1297475
