FarSense: A Comprehensive Commonsense Benchmark and Evaluation Framework for the Farsi Language

IRIS

Although Farsi is widely spoken, no comprehensive benchmark exists for assessing commonsense reasoning in language models. We therefore present FarSense, a 6‐task benchmark for Farsi covering True/False judgment, multiple-choice questions, Explanation, Cause‐Effect inference, Counterfactual reasoning, and Knowledge Completion. Starting from Farsi‐Wikipedia, we filtered noise and retained ~4,210 passages, rewrote them into realistic daily scenarios, and derived the above tasks from each scenario. Scenario and task generation quality was first judged via native‐speaker annotations on outputs from five major LLMs—GPT‐4o, Gemini-2.5-Flash, Mistral-Large, Qwen‐Plus, and DeepSeek‐Chat. Gemini-2.5-Flash demonstrated the highest performance, leading to its use in generating a large-scale dataset, subsequently finalized through meticulous two-step human validation. Using FarSense, we measured the commonsense ability of the same five flagship LLMs and also fine‐tuned six compact models (1B–24B parameters) before re‐evaluating them. To ensure broad applicability, task wording was designed to minimize dialectal, cultural, or religious bias. Experiments show that targeted fine‐tuning yields substantial gains, confirming FarSense as a reliable, openly licensed resource for advancing reproducible commonsense understanding research in Farsi NLP. We publicly release all code and data at https://github.com/KamyarZeinalipour/FarSense.

Zeinalipour, K., Jamshidi, N., Hejazi, S., Maggini, M., Bianchini, M., Paoletti, S., et al. (2025). FarSense: A Comprehensive Commonsense Benchmark and Evaluation Framework for the Farsi Language. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (pp.3529-3599). Kerrville, TX : The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.

FarSense: A Comprehensive Commonsense Benchmark and Evaluation Framework for the Farsi Language

Zeinalipour, Kamyar;Jamshidi, Neda;Hejazi, Seyedehbahareh;Maggini, Marco;Bianchini, Monica;Paoletti, Simone;Gori, Marco

2025-01-01

Abstract

Although Farsi is widely spoken, no comprehensive benchmark exists for assessing commonsense reasoning in language models. We therefore present FarSense, a 6‐task benchmark for Farsi covering True/False judgment, multiple-choice questions, Explanation, Cause‐Effect inference, Counterfactual reasoning, and Knowledge Completion. Starting from Farsi‐Wikipedia, we filtered noise and retained ~4,210 passages, rewrote them into realistic daily scenarios, and derived the above tasks from each scenario. Scenario and task generation quality was first judged via native‐speaker annotations on outputs from five major LLMs—GPT‐4o, Gemini-2.5-Flash, Mistral-Large, Qwen‐Plus, and DeepSeek‐Chat. Gemini-2.5-Flash demonstrated the highest performance, leading to its use in generating a large-scale dataset, subsequently finalized through meticulous two-step human validation. Using FarSense, we measured the commonsense ability of the same five flagship LLMs and also fine‐tuned six compact models (1B–24B parameters) before re‐evaluating them. To ensure broad applicability, task wording was designed to minimize dialectal, cultural, or religious bias. Experiments show that targeted fine‐tuning yields substantial gains, confirming FarSense as a reliable, openly licensed resource for advancing reproducible commonsense understanding research in Farsi NLP. We publicly release all code and data at https://github.com/KamyarZeinalipour/FarSense.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Citazione
	
				Zeinalipour, K., Jamshidi, N., Hejazi, S., Maggini, M., Bianchini, M., Paoletti, S., et al. (2025). FarSense: A Comprehensive Commonsense Benchmark and Evaluation Framework for the Farsi Language. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (pp.3529-3599). Kerrville, TX : The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025.ijcnlp-long.187_compressed.pdf accesso aperto Tipologia: PDF editoriale Licenza: Creative commons Dimensione 3.24 MB Formato Adobe PDF Visualizza/Apri	3.24 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1308657