Online Deep Clustering with Video Track Consistency

IRIS

Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful source of supervision, we take into account video object tracks. Following the intuition that two patches in a track should have similar visual representations in a learned feature space, we adopt an unsupervised clustering-based approach and constrain such representations to be labeled as the same category since they likely belong to the same object or object part. Experimental results on two downstream tasks on different datasets demonstrate the effectiveness of our Online Deep Clustering with Video Track Consistency (ODCT) approach compared to prior work, which did not leverage temporal information. In addition we show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.

Alfani, A., Becattini, F., Seidenari, L., Del Bimbo, A. (2022). Online Deep Clustering with Video Track Consistency. In 2022 26th International Conference on Pattern Recognition (ICPR) (pp.2650-2656). New York : IEEE [10.1109/ICPR56361.2022.9956284].

Online Deep Clustering with Video Track Consistency

Alfani, Alessandra;Becattini, Federico;Seidenari, Lorenzo;Del Bimbo, Alberto

2022-01-01

Abstract

Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful source of supervision, we take into account video object tracks. Following the intuition that two patches in a track should have similar visual representations in a learned feature space, we adopt an unsupervised clustering-based approach and constrain such representations to be labeled as the same category since they likely belong to the same object or object part. Experimental results on two downstream tasks on different datasets demonstrate the effectiveness of our Online Deep Clustering with Video Track Consistency (ODCT) approach compared to prior work, which did not leverage temporal information. In addition we show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Codice ISBN
	
				978-1-6654-9062-7
			
	Citazione
	
				Alfani, A., Becattini, F., Seidenari, L., Del Bimbo, A. (2022). Online Deep Clustering with Video Track Consistency. In 2022 26th International Conference on Pattern Recognition (ICPR) (pp.2650-2656). New York : IEEE [10.1109/ICPR56361.2022.9956284].
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Online_Deep_Clustering_with_Video_Track_Consistency__1_.pdf non disponibili Tipologia: PDF editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.69 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.69 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1225612