Lack of Supervised Data: A Deep Learning Approach in Image Analysis

Bonechi, Simone

A fundamental key-point for the recent success of deep learning models is the availability of large sets of annotated data. The scarcity of labeled data is often a significant obstacle for real-world applications, where annotations are inherently difficult and expensive to be obtained. This is particularly true for semantic segmentation, which requires pixel-level annotations. Indeed, dealing with a reduced number of fully annotated data is one of the most active research field in deep learning and different strategies are commonly employed to cope with the lack of annotations. In this thesis, we consider a weakly supervised approach to semantic segmentation. In particular, we propose a framework that can be used to generate pixel-level annotations exploiting bounding-box supervisions. Indeed, The bounding-box supervision, even though less accurate, is a valuable alternative, effective in reducing the dataset collection costs. The proposed method is based on a two-stage procedure. Firstly, a deep neural network is trained to distinguish the relevant object from the background inside a given bounding-box. Then, the same network is applied to bounding-boxes extracted from an object detection dataset and its outputs are employed in order to generate the weak pixel-level supervision for the original imageootnote{In our framework, we define The proposed approach has been tested on two different tasks. In particular, the Pascal-VOC dataset has been used to asses the quality of the proposed framework, obtaining results comparable with the state-of-the-art weakly supervised approaches. Then, we have employed the proposed method in scene text segmentation, where pixel-level annotations are very scarce. With our framework, exploiting the bounding-box supervisions of COCO-Text and MLT datasets, we have generated and released two datasets of real images with weak pixel-level supervisions (COCO_TS and MLT_S). These supervisions have been used to train deep segmentation networks and the experiments show that COCO_TS and MLT_S are a valid alternative to the use of synthetic images, which is the standard approach applied for pre-training a scene text segmentation network. Furthermore, when a network is trained in absence of labeled data, another main issue to be faced is the validation of the model. Therefore, we also propose some confidence measures that can be used to evaluate a trained network in absence of annotated samples. Such measures are employed in a domain adaptation framework - based on generative adversarial networks (GANs) - which aims at training a model on a source domain with labeled data capable to generalize on a target unlabeled dataset. Confidence measures proved to be correlated with the real accuracy of the model. The experiments, carried out on two domain adaptation tasks (SVHN to MNIST and CIFAR to STL), show that the proposed measures can be used both to estimate the performance of the trained model and to properly stop the GANs training.

Bonechi, S. (2020). Lack of Supervised Data: A Deep Learning Approach in Image Analysis.