Neural paraphrasing by automatically crawled and aligned sentence pairs

IRIS

Paraphrasing is the task of re-writing an input text using other words, without altering the meaning of the original content. Conversational systems can exploit automatic paraphrasing to make the conversation more natural, e.g., talking about a certain topic using different paraphrases in different time instants. Recently, the task of automatically generating paraphrases has been approached in the context of Natural Language Generation (NLG). While many existing systems simply consist in rule-based models, the recent success of the Deep Neural Networks in several NLG tasks naturally suggests the possibility of exploiting such networks for generating paraphrases. However, the main obstacle toward neural-network-based paraphrasing is the lack of large datasets with aligned pairs of sentences and paraphrases, that are needed to efficiently train the neural models. In this paper we present a method for the automatic generation of large aligned corpora, that is based on the assumption that news and blog websites talk about the same events using different narrative styles. We propose a similarity search procedure with linguistic constraints that, given a reference sentence, is able to locate the most similar candidate paraphrases out from millions of indexed sentences. The data generation process is evaluated in the case of the Italian language, performing experiments using pointer-based deep neural architectures.

Globo, A., Trevisi, A., Zugarini, A., Rigutini, L., Maggini, M., Melacci, S. (2019). Neural paraphrasing by automatically crawled and aligned sentence pairs. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp.429-434). New York : IEEE [10.1109/SNAMS.2019.8931824].

Neural paraphrasing by automatically crawled and aligned sentence pairs

Globo, Achille;Trevisi, Antonio;Zugarini, Andrea;Rigutini, Leonardo;Maggini, Marco;Melacci, Stefano

2019-01-01

Abstract

Paraphrasing is the task of re-writing an input text using other words, without altering the meaning of the original content. Conversational systems can exploit automatic paraphrasing to make the conversation more natural, e.g., talking about a certain topic using different paraphrases in different time instants. Recently, the task of automatically generating paraphrases has been approached in the context of Natural Language Generation (NLG). While many existing systems simply consist in rule-based models, the recent success of the Deep Neural Networks in several NLG tasks naturally suggests the possibility of exploiting such networks for generating paraphrases. However, the main obstacle toward neural-network-based paraphrasing is the lack of large datasets with aligned pairs of sentences and paraphrases, that are needed to efficiently train the neural models. In this paper we present a method for the automatic generation of large aligned corpora, that is based on the assumption that news and blog websites talk about the same events using different narrative styles. We propose a similarity search procedure with linguistic constraints that, given a reference sentence, is able to locate the most similar candidate paraphrases out from millions of indexed sentences. The data generation process is evaluated in the case of the Italian language, performing experiments using pointer-based deep neural architectures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Codice ISBN
	
				978-1-7281-2946-4
			
	Citazione
	
				Globo, A., Trevisi, A., Zugarini, A., Rigutini, L., Maggini, M., Melacci, S. (2019). Neural paraphrasing by automatically crawled and aligned sentence pairs. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp.429-434). New York : IEEE [10.1109/SNAMS.2019.8931824].
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
08931824.pdf non disponibili Tipologia: PDF editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 799.37 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	799.37 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1089617