Crossword puzzles are popular linguistic games often used as tools to engage students in learning. Educational crosswords are characterized by less cryptic, and more factual clues that distinguish them from traditional crossword puzzles. Despite there exist several publicly available clue-answer pair databases for traditional crosswords, educational clue-answer pairs datasets are missing. In this article, we propose an educational clue dataset, describing the automatic methodology used to build it and reporting some preliminary tests of ML models involved in clue-generation tasks. By gathering from Wikipedia pages informative content associated with relevant keywords, we use Large Language Models to automatically generate pedagogical clues related to the given input keyword and its context. With such an approach, we created clue-instruct, a dataset containing 44,475 unique examples with text-keyword pairs associated with three distinct crossword clues. We used clue-instruct to instruct different Large Language Models to generate educational clues from a given input content and keyword. Both human and automatic evaluations indicate that both clue-instruct and the fine-tuned LLMs can be a valuable resource for enhancing education, thus validating the good quality of our method.
Zugarini, A., Zeinalipour, K., Sai Kadali, S., Maggini, M., Gori, M., Rigutini, L. (2024). Clue-Instruct: Text-Based Clue Generation for Educational Crossword Puzzles. In Proceedings of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp.3347-3356). ELRA and ICCL.
Clue-Instruct: Text-Based Clue Generation for Educational Crossword Puzzles
Kamyar Zeinalipour
;Marco Maggini;Marco Gori;Leonardo Rigutini
2024-01-01
Abstract
Crossword puzzles are popular linguistic games often used as tools to engage students in learning. Educational crosswords are characterized by less cryptic, and more factual clues that distinguish them from traditional crossword puzzles. Despite there exist several publicly available clue-answer pair databases for traditional crosswords, educational clue-answer pairs datasets are missing. In this article, we propose an educational clue dataset, describing the automatic methodology used to build it and reporting some preliminary tests of ML models involved in clue-generation tasks. By gathering from Wikipedia pages informative content associated with relevant keywords, we use Large Language Models to automatically generate pedagogical clues related to the given input keyword and its context. With such an approach, we created clue-instruct, a dataset containing 44,475 unique examples with text-keyword pairs associated with three distinct crossword clues. We used clue-instruct to instruct different Large Language Models to generate educational clues from a given input content and keyword. Both human and automatic evaluations indicate that both clue-instruct and the fine-tuned LLMs can be a valuable resource for enhancing education, thus validating the good quality of our method.File | Dimensione | Formato | |
---|---|---|---|
LREC2024.pdf
accesso aperto
Tipologia:
PDF editoriale
Licenza:
Creative commons
Dimensione
2.39 MB
Formato
Adobe PDF
|
2.39 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1262954