Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream task using a relatively smaller amount of training data. In a similar fashion, this paper considers a plug-and-play framework that can be used to inject multiple syntactic features, like Part-of-Speech Tagging or Dependency Parsing, into any kind of pre-trained Transformer. This novel approach allows to perform sequence-to-sequence labeling tasks by exploiting: (i) the (more abundant) available training data that is also used to learn the syntactic features, (ii) the language data that is used to pre-train the transformer model. The experimental results show that our approach improves over the baseline performances of the underlying model in different datasets, thus proving the effectiveness of employing syntactic language information for semantic regularization. In addition, we show that our architecture has a huge efficiency advantage over pure large language models. Indeed, by using a model with limited size, but whose input data are enriched with syntactic information, we show that it is possible to obtain a significant reduction of CO2 emissions without decreasing the prediction performances.

Fioravanti, S., Zugarini, A., Giannini, F., Rigutini, L., Maggini, M., Diligenti, M. (2023). Linguistic Feature Injection for Efficient Natural Language Processing. In 2023 International Joint Conference on Neural Networks (IJCNN) (pp.01-07). New York : Institute of Electrical and Electronics Engineers Inc. [10.1109/IJCNN54540.2023.10191680].

Linguistic Feature Injection for Efficient Natural Language Processing

Fioravanti S.
;
Giannini F.;Rigutini L.;Maggini M.;Diligenti M.
2023-01-01

Abstract

Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream task using a relatively smaller amount of training data. In a similar fashion, this paper considers a plug-and-play framework that can be used to inject multiple syntactic features, like Part-of-Speech Tagging or Dependency Parsing, into any kind of pre-trained Transformer. This novel approach allows to perform sequence-to-sequence labeling tasks by exploiting: (i) the (more abundant) available training data that is also used to learn the syntactic features, (ii) the language data that is used to pre-train the transformer model. The experimental results show that our approach improves over the baseline performances of the underlying model in different datasets, thus proving the effectiveness of employing syntactic language information for semantic regularization. In addition, we show that our architecture has a huge efficiency advantage over pure large language models. Indeed, by using a model with limited size, but whose input data are enriched with syntactic information, we show that it is possible to obtain a significant reduction of CO2 emissions without decreasing the prediction performances.
2023
978-1-6654-8867-9
Fioravanti, S., Zugarini, A., Giannini, F., Rigutini, L., Maggini, M., Diligenti, M. (2023). Linguistic Feature Injection for Efficient Natural Language Processing. In 2023 International Joint Conference on Neural Networks (IJCNN) (pp.01-07). New York : Institute of Electrical and Electronics Engineers Inc. [10.1109/IJCNN54540.2023.10191680].
File in questo prodotto:
File Dimensione Formato  
Linguistic_Feature_Injection_for_Efficient_Natural_Language_Processing.pdf

non disponibili

Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 865.4 kB
Formato Adobe PDF
865.4 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1245794