One of the enduring challenges for the Machine Learning community is developing models that can process and learn from very long data sequences. Transformer-based models and Recurrent Neural Networks (RNNs) have excelled in processing long sequences, yet face challenges in transitioning to processing infinite-length sequences online, a crucial step in mimicking human learning over continuous data streams. While Transformer models handle large context windows efficiently, they suffer from quadratic computational costs, motivating research into alternative attention mechanisms. Conversely, RNNs, particularly Deep State-Space Models (SSMs), have shown promise in long sequence tasks, outperforming Transformers in certain benchmarks. However, current approaches are limited to finite-length sequences, which are pre-buffered and randomly shuffled to cope with stochastic gradient descent. This paper addresses the fundamental gap in transitioning from offline-processing of a dataset of sequences to online-processing of possibly infinite-length sequences, a scenario often neglected in existing research. Empirical evidence is presented, demonstrating the performance and limits of existing models. We highlight the challenges and opportunities in learning from a continuous data stream, paving the way for future research in this area.
Casoni, M., Guidi, T., Tiezzi, M., Betti, A., Gori, M., Melacci, S. (2024). Pitfalls in Processing Infinite-Length Sequences with Popular Approaches for Sequential Data. In Artificial Neural Networks in Pattern Recognition (pp.37-48). Cham : Springer [10.1007/978-3-031-71602-7_4].
Pitfalls in Processing Infinite-Length Sequences with Popular Approaches for Sequential Data
Casoni Michele
;Gori Marco;Melacci Stefano
2024-01-01
Abstract
One of the enduring challenges for the Machine Learning community is developing models that can process and learn from very long data sequences. Transformer-based models and Recurrent Neural Networks (RNNs) have excelled in processing long sequences, yet face challenges in transitioning to processing infinite-length sequences online, a crucial step in mimicking human learning over continuous data streams. While Transformer models handle large context windows efficiently, they suffer from quadratic computational costs, motivating research into alternative attention mechanisms. Conversely, RNNs, particularly Deep State-Space Models (SSMs), have shown promise in long sequence tasks, outperforming Transformers in certain benchmarks. However, current approaches are limited to finite-length sequences, which are pre-buffered and randomly shuffled to cope with stochastic gradient descent. This paper addresses the fundamental gap in transitioning from offline-processing of a dataset of sequences to online-processing of possibly infinite-length sequences, a scenario often neglected in existing research. Empirical evidence is presented, demonstrating the performance and limits of existing models. We highlight the challenges and opportunities in learning from a continuous data stream, paving the way for future research in this area.File | Dimensione | Formato | |
---|---|---|---|
melacci_ANNPR2024.pdf
non disponibili
Tipologia:
PDF editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
397.7 kB
Formato
Adobe PDF
|
397.7 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1274934