Multiple Object Tracking (MOT) has been a central topic among various com- puter vision tasks, especially in recent years, with the advent of deep learning. The approaches to MOT, lately, have in fact shifted towards the ”tracking by detection” paradigm supported by a re-identification (ReID) architecture, since a robust model of the target appearance is fundamental to recover a target after long term occlusions. Such approaches have achieved great performances on famous dataset, i.e. MOT challenge datasets, defining the state of the art, but still leaving a lot of space for im- provements. The majority of re-identification architectures consist in a deep convolutional neural network taking as input a region of interest (ROI), an image containing all the pixels belonging to the target person and as few pixel as possible belonging to what is consid- ered background, and giving as output a ”descriptor” of the appearance of said ROI as a vector of values. A common approach in the tracking problem is to extract the ROIs using a person detector, which usually is based on another deep convolutional network such as YOLO or MaskRCNN. A common problem arises in using ReID in the tracking pipeline, and it is evident when dealing with the partial occlusion of a target caused by another target: since the appearance of both targets is modeled from a ROI which contains inevitably foreground (target) pixels and background pixels, when two target are partially overlapping each other during the occlusion, the ROIs of the detections are intersected, which means that the obtained models will be describing a more or less great quantity of common pixels, resulting in similar descriptors for different targets. In this work I will present an effective way to lessen this problem by flanking the ”global” descriptor from the ReID network with ”local” descriptors of parts of each target, perform the matching separately and then fusing the ”global” and ”local” match scores using a custom designed algorithm inspired by the theory of ensemble classifiers.

Fiori, L. (2021). Local and global deep features for multi object tracking [10.25434/luca-fiori_phd2021].

Local and global deep features for multi object tracking

Luca Fiori
Project Administration
2021-01-01

Abstract

Multiple Object Tracking (MOT) has been a central topic among various com- puter vision tasks, especially in recent years, with the advent of deep learning. The approaches to MOT, lately, have in fact shifted towards the ”tracking by detection” paradigm supported by a re-identification (ReID) architecture, since a robust model of the target appearance is fundamental to recover a target after long term occlusions. Such approaches have achieved great performances on famous dataset, i.e. MOT challenge datasets, defining the state of the art, but still leaving a lot of space for im- provements. The majority of re-identification architectures consist in a deep convolutional neural network taking as input a region of interest (ROI), an image containing all the pixels belonging to the target person and as few pixel as possible belonging to what is consid- ered background, and giving as output a ”descriptor” of the appearance of said ROI as a vector of values. A common approach in the tracking problem is to extract the ROIs using a person detector, which usually is based on another deep convolutional network such as YOLO or MaskRCNN. A common problem arises in using ReID in the tracking pipeline, and it is evident when dealing with the partial occlusion of a target caused by another target: since the appearance of both targets is modeled from a ROI which contains inevitably foreground (target) pixels and background pixels, when two target are partially overlapping each other during the occlusion, the ROIs of the detections are intersected, which means that the obtained models will be describing a more or less great quantity of common pixels, resulting in similar descriptors for different targets. In this work I will present an effective way to lessen this problem by flanking the ”global” descriptor from the ReID network with ”local” descriptors of parts of each target, perform the matching separately and then fusing the ”global” and ”local” match scores using a custom designed algorithm inspired by the theory of ensemble classifiers.
2021
Fiori, L. (2021). Local and global deep features for multi object tracking [10.25434/luca-fiori_phd2021].
Fiori, Luca
File in questo prodotto:
File Dimensione Formato  
phd_unisi_076679.pdf

accesso aperto

Descrizione: Tesi di dottorato
Tipologia: PDF editoriale
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 2.64 MB
Formato Adobe PDF
2.64 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1149468