Virtual Worlds and Active Learning for Human Detection

2011·
David Vázquez
,
Antonio M López
,
Daniel Ponsa
,
Javier Marı́n
PDF
Abstract
Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e., trained with labelled samples. However, labelling is a manual labor intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e., using realistic computer graphics. Later, these models are used for human detection in images of the real world. Indeed, the results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or pretty similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. Thus, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples, something not done before. We present quantitative results showing that this approach is valid.
Type
Publication
International Conference on Multimodal Interaction (ICMI)