FLODCAST: Flow and depth forecasting via multimodal recurrent architectures

IRIS

Forecasting motion and spatial positions of objects is of fundamental importance, especially in safety-critical settings such as autonomous driving. In this work, we address the issue by forecasting two different modalities that carry complementary information, namely optical flow and depth. To this end we propose FLODCAST a flow and depth forecasting model that leverages a multitask recurrent architecture, trained to jointly forecast both modalities at once. We stress the importance of training using flows and depth maps together, demonstrating that both tasks improve when the model is informed of the other modality. We train the proposed model to also perform predictions for several timesteps in the future. This provides better supervision and leads to more precise predictions, retaining the capability of the model to yield outputs autoregressively for any future time horizon. We test our model on the challenging Cityscapes dataset, obtaining state of the art results for both flow and depth forecasting. Thanks to the high quality of the generated flows, we also report benefits on the downstream task of segmentation forecasting, injecting our predictions in a flow-based mask-warping framework.

Ciamarra, A., Becattini, F., Seidenari, L., Del Bimbo, A. (2024). FLODCAST: Flow and depth forecasting via multimodal recurrent architectures. PATTERN RECOGNITION, 150 [10.1016/j.patcog.2024.110337].

FLODCAST: Flow and depth forecasting via multimodal recurrent architectures

Ciamarra, Andrea;Becattini, Federico;Seidenari, Lorenzo;Del Bimbo, Alberto

2024-01-01

Abstract

Forecasting motion and spatial positions of objects is of fundamental importance, especially in safety-critical settings such as autonomous driving. In this work, we address the issue by forecasting two different modalities that carry complementary information, namely optical flow and depth. To this end we propose FLODCAST a flow and depth forecasting model that leverages a multitask recurrent architecture, trained to jointly forecast both modalities at once. We stress the importance of training using flows and depth maps together, demonstrating that both tasks improve when the model is informed of the other modality. We train the proposed model to also perform predictions for several timesteps in the future. This provides better supervision and leads to more precise predictions, retaining the capability of the model to yield outputs autoregressively for any future time horizon. We test our model on the challenging Cityscapes dataset, obtaining state of the art results for both flow and depth forecasting. Thanks to the high quality of the generated flows, we also report benefits on the downstream task of segmentation forecasting, injecting our predictions in a flow-based mask-warping framework.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista su cui è pubblicata l'opera
	
				PATTERN RECOGNITION
			
	Citazione
	
				Ciamarra, A., Becattini, F., Seidenari, L., Del Bimbo, A. (2024). FLODCAST: Flow and depth forecasting via multimodal recurrent architectures. PATTERN RECOGNITION, 150 [10.1016/j.patcog.2024.110337].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0031320324000888-main.pdf accesso aperto Tipologia: PDF editoriale Licenza: Creative commons Dimensione 4.56 MB Formato Adobe PDF Visualizza/Apri	4.56 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1256394