Crossword puzzles are popular linguistic games often used as tools to engage students in learning. Educational crosswords are characterized by less cryptic, and more factual clues that distinguish them from traditional crossword puzzles. Despite there exist several publicly available clue-answer pair databases for traditional crosswords, educational clue-answer pairs datasets are missing. In this article, we propose an educational clue dataset, describing the automatic methodology used to build it and reporting some preliminary tests of ML models involved in clue-generation tasks. By gathering from Wikipedia pages informative content associated with relevant keywords, we use Large Language Models to automatically generate pedagogical clues related to the given input keyword and its context. With such an approach, we created clue-instruct, a dataset containing 44,475 unique examples with text-keyword pairs associated with three distinct crossword clues. We used clue-instruct to instruct different Large Language Models to generate educational clues from a given input content and keyword. Both human and automatic evaluations indicate that both clue-instruct and the fine-tuned LLMs can be a valuable resource for enhancing education, thus validating the good quality of our method.

Zugarini, A., Zeinalipour, K., Sai Kadali, S., Maggini, M., Gori, M., Rigutini, L. (2024). Clue-Instruct: Text-Based Clue Generation for Educational Crossword Puzzles. In Proceedings of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp.3347-3356). ELRA and ICCL.

Clue-Instruct: Text-Based Clue Generation for Educational Crossword Puzzles

Kamyar Zeinalipour
;
Marco Maggini;Marco Gori;Leonardo Rigutini
2024-01-01

Abstract

Crossword puzzles are popular linguistic games often used as tools to engage students in learning. Educational crosswords are characterized by less cryptic, and more factual clues that distinguish them from traditional crossword puzzles. Despite there exist several publicly available clue-answer pair databases for traditional crosswords, educational clue-answer pairs datasets are missing. In this article, we propose an educational clue dataset, describing the automatic methodology used to build it and reporting some preliminary tests of ML models involved in clue-generation tasks. By gathering from Wikipedia pages informative content associated with relevant keywords, we use Large Language Models to automatically generate pedagogical clues related to the given input keyword and its context. With such an approach, we created clue-instruct, a dataset containing 44,475 unique examples with text-keyword pairs associated with three distinct crossword clues. We used clue-instruct to instruct different Large Language Models to generate educational clues from a given input content and keyword. Both human and automatic evaluations indicate that both clue-instruct and the fine-tuned LLMs can be a valuable resource for enhancing education, thus validating the good quality of our method.
2024
978-2-493814-10-4
Zugarini, A., Zeinalipour, K., Sai Kadali, S., Maggini, M., Gori, M., Rigutini, L. (2024). Clue-Instruct: Text-Based Clue Generation for Educational Crossword Puzzles. In Proceedings of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp.3347-3356). ELRA and ICCL.
File in questo prodotto:
File Dimensione Formato  
LREC2024.pdf

accesso aperto

Tipologia: PDF editoriale
Licenza: Creative commons
Dimensione 2.39 MB
Formato Adobe PDF
2.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1262954