Towards Building a Trustworthy RAG-Based Chatbot for the Italian Public Administration

IRIS

Building a Trustworthy Retrieval-Augmented Generation (RAG) chatbot for Italy’s public sector presents challenges that go beyond selecting an appropriate Large Language Model. A major issue is the retrieval phase, where Italian text embedders often underperform compared to English and multilingual counterparts, hindering precise identification and contextualization of critical information. Regulatory constraints further complicate matters by disallowing closed source or cloud based models, forcing reliance on on-premise or fully open source solutions that may not fully address the linguistic complexities of Italian documents. In our study, we evaluate three embedding approaches using a publicly available Italian dataset: a monolingual Italian approach, a translation based method leveraging English only embedders with backward reference mapping, and a multilingual framework applied to both original and translated texts. Our methodology involves chunking documents into coherent segments, embedding them in a high dimensional semantic space, and measuring retrieval accuracy via top-k similarity searches. Our results indicate that the translation based approach significantly improves retrieval performance over Italian specific models, suggesting that bilingual mapping can effectively address both domain specific challenges and regulatory constraints in developing RAG pipelines for public administration.

Mala, C.S., Di Maio, C., Proietti, M., Gezici, G., Giannotti, F., Melacci, S., et al. (2025). Towards Building a Trustworthy RAG-Based Chatbot for the Italian Public Administration. In Frontiers in Artificial Intelligence and Applications (pp.196-204). IOS Press BV [10.3233/faia250637].

Towards Building a Trustworthy RAG-Based Chatbot for the Italian Public Administration

Mala, Chandana Sree;di Maio, Christian;Proietti, Mattia;Gezici, Gizem;Giannotti, Fosca;Melacci, Stefano;Lenci, Alessandro;Gori, Marco

2025-01-01

Abstract

Building a Trustworthy Retrieval-Augmented Generation (RAG) chatbot for Italy’s public sector presents challenges that go beyond selecting an appropriate Large Language Model. A major issue is the retrieval phase, where Italian text embedders often underperform compared to English and multilingual counterparts, hindering precise identification and contextualization of critical information. Regulatory constraints further complicate matters by disallowing closed source or cloud based models, forcing reliance on on-premise or fully open source solutions that may not fully address the linguistic complexities of Italian documents. In our study, we evaluate three embedding approaches using a publicly available Italian dataset: a monolingual Italian approach, a translation based method leveraging English only embedders with backward reference mapping, and a multilingual framework applied to both original and translated texts. Our methodology involves chunking documents into coherent segments, embedding them in a high dimensional semantic space, and measuring retrieval accuracy via top-k similarity searches. Our results indicate that the translation based approach significantly improves retrieval performance over Italian specific models, suggesting that bilingual mapping can effectively address both domain specific challenges and regulatory constraints in developing RAG pipelines for public administration.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9781643686110
			
	Citazione
	
				Mala, C.S., Di Maio, C., Proietti, M., Gezici, G., Giannotti, F., Melacci, S., et al. (2025). Towards Building a Trustworthy RAG-Based Chatbot for the Italian Public Administration. In Frontiers in Artificial Intelligence and Applications (pp.196-204). IOS Press BV [10.3233/faia250637].
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
melacci_HHAI2025.pdf accesso aperto Tipologia: PDF editoriale Licenza: Creative commons Dimensione 392.4 kB Formato Adobe PDF Visualizza/Apri	392.4 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1315905