Hallucinating optimal high-dimensional subspaces
Introduction
One of the most commonly encountered problems in computer vision is that of matching appearance. Whether it is images of local features [1], views of objects [2] or faces [3], textures [4] or rectified planar structures (buildings, paintings) [5], the task of comparing appearances is virtually unavoidable in a modern computer vision application. A particularly interesting and increasingly important instance of this task concerns the matching of sets of appearance images, each set containing examples of variation corresponding to a single class.
A ubiquitous representation of appearance variation within a class is by a linear subspace [6], [7]. The most basic argument for the linear subspace representation can be made by observing that in practice the appearance of interest is constrained to a small part of the image space. Domain-specific information may restrict this even further e.g. for Lambertian surfaces seen from a fixed viewpoint but under variable illumination [8], [9], [10] or smooth objects across changing pose [11], [12]. Moreover, linear subspace models are also attractive for their low storage demands – they are inherently compact and can be learnt incrementally [13], [14], [15], [16], [17], [18]. Indeed, throughout this paper it is assumed that the original data from which subspaces are estimated is not available.
A problem which arises when trying to match two subspaces – each representing certain appearance variation – and which has not as of yet received due consideration in the literature is that of matching subspaces embedded in different image spaces, that is, corresponding to image sets of different scales. This is a frequent occurrence: an object one wishes to recognize may appear larger or smaller in an image depending on its distance, just as a face may, depending on the person׳s height and positioning relative to the camera. In most matching problems in the computer vision literature, this issue is overlooked. Here it is addressed in detail and shown that a naïve approach to normalizing for scale in subspaces results in inadequate matching performance. Thus, a method is proposed which without any assumptions on the nature of appearance that the subspaces represent constructs an optimal hypothesis for a high-resolution reconstruction of the subspace corresponding to low-resolution data.
In the next section, a brief overview of the linear subspace representation is given first, followed by a description of the aforementioned naïve scale normalization. The proposed solution is described in this section as well. In Section 3 the two approaches are compared empirically and the results are analysed in detail. The main contribution and conclusions of the paper are summarized in Section 4.
Section snippets
Matching subspaces across scale
Consider a set containing vectors which represent rasterized images:where d is the number of pixels in each image. It is assumed that all of the images represented by members of X have the same aspect ratio, so that the same indices of different vectors correspond spatially to the same pixel location. A common representation of appearance variation described by X is by a linear subspace of dimension D, where usually it is the case that . If is the estimate of the mean of
Experimental analysis
The theoretical ideas put forward in the preceding sections were evaluated empirically on two popular problems in computer vision: matching sets of images of (i) face appearances and (ii) object appearances. For this, two large data sets were used. These are
- •
The Cambridge Face Motion Database [20], [21],1 and
- •
The Amsterdam Library of Object Images [22].2
Conclusion
In this paper a method for matching linear subspaces which represent appearance variations in images of different scales was described. The approach consists of an initial re-projection of the subspace in the low-dimensional image space to the high-dimensional one, and subsequent refinement of the re-projection through a constrained rotation. Using facial and object appearance images and the corresponding two large data sets, it was shown that the proposed algorithm successfully reconstructs
Conflict of interest
None declared.
Acknowledgements
The author would like to thank Trinity College Cambridge for their kind support and the volunteers from the University of Cambridge Department of Engineering whose face data was included in the database used in developing the algorithm described in this paper.
Ognjen Arandjelović graduated top of his class from the Department of Engineering Science at the University of Oxford (M.E.). In 2007 he was awarded the Ph.D. degree from the University of Cambridge. After spending 4 years as a Fellow of Trinity College Cambridge, he moved to Swansea University as a Lecturer in Visual Computing. Currently he is a Senior Lecturer in Pattern Recognition and Data Analytics at Deakin University; he also holds the title of an Associated Professor at Université Laval.
References (23)
Computationally efficient application of the generic shape-illumination invariant to face recognition from video
Pattern Recognit.
(2012)- V. Ferrari, T. Tuytelaars, L. Van Gool, Retrieving objects from videos based on affine regions, in: Proceedings of...
- M. Everingham, A. Zisserman, C. Williams, C. Van Gool, et al., The 2005 PASCAL visual object classes challenge, in:...
- et al.
Hierarchical ensemble of global and local classifiers for face recognition
IEEE Trans. Image Process.
(2009) - R. Pradhan, Z.G. Bhutia, M. Nasipuri, M.P. Pradhan, Gradient and principal component analysis based texture recognition...
- R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed.,...
- et al.
An analysis of linear subspace approaches for computer vision and pattern recognition
Int. J. Comput. Vis.
(2006) Factorial coding of natural imageshow effective are linear models in removing higher-order dependencies?
J. Opt. Soc. Am.
(2006)- et al.
What is the set of images of an object under all possible illumination conditions?
Int. J. Comput. Vis.
(1998) - et al.
From few to manyillumination cone models for face recognition under variable lighting and pose
IEEE Trans. Pattern Anal. Mach. Intell.
(2001)
Lambertian reflectance and linear subspaces
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (12)
Localization and phenotyping of tuberculosis bacteria using a combination of deep learning and SVMs
2023, Computers in Biology and MedicineRobust face hallucination via locality-constrained multiscale coding
2020, Information SciencesMulti-view face hallucination using SVD and a mapping model
2019, Information SciencesCitation Excerpt :In [2], a Generic Shape-Illumination Manifold (gSIM) framework was designed to hallucinate faces across different poses and scales. Later, an efficient framework for matching linear subspaces in images of different scales was designed [3]. More recently, Farrugia and Guillemot proposed a coupled sparse support (CSS) face-hallucination framework via estimating the local geometrical structure on the high-resolution manifold [8].
Reimagining the central challenge of face recognition: Turning a problem into an advantage
2018, Pattern RecognitionCitation Excerpt :I illustrate this idea with a few examples. If the variation within a set is modelled using a linear subspace and the subspace-to-subspace generalization of the distance from feature space (DFFS) [31] adopted as the (dis)similarity measure between them, the most similar modes of variation between two sets represented using such subspaces are sub-subspaces themselves [32]. These correspond to different exemplars fxy in Fig. 3 and can be compared using the DFFS baseline.
Recovering variations in facial albedo from low resolution images
2018, Pattern RecognitionCitation Excerpt :Liu et al. [18,19] proposed a two-step statistical modeling approach that integrates both a global parametric model and a local nonparametric model, and achieved very promising face hallucination results. Arandjelović [20] successfully reconstruct the personal subspace in the high-dimensional image space from a low-dimensional input without any assumptions on the nature of appearance that the subspaces represent. Recent studies [8–11,21–23] share a similar idea of using patch-based method to model the prior information of local structure of face images.
Estimating Phenotypic Characteristics of Tuberculosis Bacteria
2023, Proceedings of the ACM Symposium on Applied Computing
Ognjen Arandjelović graduated top of his class from the Department of Engineering Science at the University of Oxford (M.E.). In 2007 he was awarded the Ph.D. degree from the University of Cambridge. After spending 4 years as a Fellow of Trinity College Cambridge, he moved to Swansea University as a Lecturer in Visual Computing. Currently he is a Senior Lecturer in Pattern Recognition and Data Analytics at Deakin University; he also holds the title of an Associated Professor at Université Laval. His main research interests are computer vision and machine learning, and their applications in various fields of science. He is a Fellow of the Cambridge Overseas Trust and a winner of multiple best research paper awards.