Elsevier

Pattern Recognition

Volume 47, Issue 8, August 2014, Pages 2662-2672
Pattern Recognition

Hallucinating optimal high-dimensional subspaces

https://doi.org/10.1016/j.patcog.2014.02.006Get rights and content

Highlights

  • Matching appearance subspaces of different scales not addressed in the literature.

  • Naïve basis re-projection inadequate.

  • Proposed refinement using constrained rotation.

  • Closed-form solution for optimal subspace reconstruction.

  • Class separation increased up to an order of magnitude.

Abstract

Linear subspace representations of appearance variation are pervasive in computer vision. This paper addresses the problem of robustly matching such subspaces (computing the similarity between them) when they are used to describe the scope of variations within sets of images of different (possibly greatly so) scales. A naïve solution of projecting the low-scale subspace into the high-scale image space is described first and subsequently shown to be inadequate, especially at large scale discrepancies. A successful approach is proposed instead. It consists of (i) an interpolated projection of the low-scale subspace into the high-scale space, which is followed by (ii) a rotation of this initial estimate within the bounds of the imposed “downsampling constraint”. The optimal rotation is found in the closed-form which best aligns the high-scale reconstruction of the low-scale subspace with the reference it is compared to. The method is evaluated on the problem of matching sets of (i) face appearances under varying illumination and (ii) object appearances under varying viewpoint, using two large data sets. In comparison to the naïve matching, the proposed algorithm is shown to greatly increase the separation of between-class and within-class similarities, as well as produce far more meaningful modes of common appearance on which the match score is based.

Introduction

One of the most commonly encountered problems in computer vision is that of matching appearance. Whether it is images of local features [1], views of objects [2] or faces [3], textures [4] or rectified planar structures (buildings, paintings) [5], the task of comparing appearances is virtually unavoidable in a modern computer vision application. A particularly interesting and increasingly important instance of this task concerns the matching of sets of appearance images, each set containing examples of variation corresponding to a single class.

A ubiquitous representation of appearance variation within a class is by a linear subspace [6], [7]. The most basic argument for the linear subspace representation can be made by observing that in practice the appearance of interest is constrained to a small part of the image space. Domain-specific information may restrict this even further e.g. for Lambertian surfaces seen from a fixed viewpoint but under variable illumination [8], [9], [10] or smooth objects across changing pose [11], [12]. Moreover, linear subspace models are also attractive for their low storage demands – they are inherently compact and can be learnt incrementally [13], [14], [15], [16], [17], [18]. Indeed, throughout this paper it is assumed that the original data from which subspaces are estimated is not available.

A problem which arises when trying to match two subspaces – each representing certain appearance variation – and which has not as of yet received due consideration in the literature is that of matching subspaces embedded in different image spaces, that is, corresponding to image sets of different scales. This is a frequent occurrence: an object one wishes to recognize may appear larger or smaller in an image depending on its distance, just as a face may, depending on the person׳s height and positioning relative to the camera. In most matching problems in the computer vision literature, this issue is overlooked. Here it is addressed in detail and shown that a naïve approach to normalizing for scale in subspaces results in inadequate matching performance. Thus, a method is proposed which without any assumptions on the nature of appearance that the subspaces represent constructs an optimal hypothesis for a high-resolution reconstruction of the subspace corresponding to low-resolution data.

In the next section, a brief overview of the linear subspace representation is given first, followed by a description of the aforementioned naïve scale normalization. The proposed solution is described in this section as well. In Section 3 the two approaches are compared empirically and the results are analysed in detail. The main contribution and conclusions of the paper are summarized in Section 4.

Section snippets

Matching subspaces across scale

Consider a set XRd containing vectors which represent rasterized images:X={x1,,xN}where d is the number of pixels in each image. It is assumed that all of the images represented by members of X have the same aspect ratio, so that the same indices of different vectors correspond spatially to the same pixel location. A common representation of appearance variation described by X is by a linear subspace of dimension D, where usually it is the case that D⪢d. If mX is the estimate of the mean of

Experimental analysis

The theoretical ideas put forward in the preceding sections were evaluated empirically on two popular problems in computer vision: matching sets of images of (i) face appearances and (ii) object appearances. For this, two large data sets were used. These are

  • The Cambridge Face Motion Database [20], [21],1 and

  • The Amsterdam Library of Object Images [22].2

Their contents are reviewed next.

Conclusion

In this paper a method for matching linear subspaces which represent appearance variations in images of different scales was described. The approach consists of an initial re-projection of the subspace in the low-dimensional image space to the high-dimensional one, and subsequent refinement of the re-projection through a constrained rotation. Using facial and object appearance images and the corresponding two large data sets, it was shown that the proposed algorithm successfully reconstructs

Conflict of interest

None declared.

Acknowledgements

The author would like to thank Trinity College Cambridge for their kind support and the volunteers from the University of Cambridge Department of Engineering whose face data was included in the database used in developing the algorithm described in this paper.

Ognjen Arandjelović graduated top of his class from the Department of Engineering Science at the University of Oxford (M.E.). In 2007 he was awarded the Ph.D. degree from the University of Cambridge. After spending 4 years as a Fellow of Trinity College Cambridge, he moved to Swansea University as a Lecturer in Visual Computing. Currently he is a Senior Lecturer in Pattern Recognition and Data Analytics at Deakin University; he also holds the title of an Associated Professor at Université Laval.

References (23)

  • O. Arandjelović

    Computationally efficient application of the generic shape-illumination invariant to face recognition from video

    Pattern Recognit.

    (2012)
  • V. Ferrari, T. Tuytelaars, L. Van Gool, Retrieving objects from videos based on affine regions, in: Proceedings of...
  • M. Everingham, A. Zisserman, C. Williams, C. Van Gool, et al., The 2005 PASCAL visual object classes challenge, in:...
  • Y. Su et al.

    Hierarchical ensemble of global and local classifiers for face recognition

    IEEE Trans. Image Process.

    (2009)
  • R. Pradhan, Z.G. Bhutia, M. Nasipuri, M.P. Pradhan, Gradient and principal component analysis based texture recognition...
  • R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed.,...
  • P. Chen et al.

    An analysis of linear subspace approaches for computer vision and pattern recognition

    Int. J. Comput. Vis.

    (2006)
  • M. Bethge

    Factorial coding of natural imageshow effective are linear models in removing higher-order dependencies?

    J. Opt. Soc. Am.

    (2006)
  • P.N. Belhumeur et al.

    What is the set of images of an object under all possible illumination conditions?

    Int. J. Comput. Vis.

    (1998)
  • A.S. Georghiades et al.

    From few to manyillumination cone models for face recognition under variable lighting and pose

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • R. Basri et al.

    Lambertian reflectance and linear subspaces

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • Cited by (12)

    • Multi-view face hallucination using SVD and a mapping model

      2019, Information Sciences
      Citation Excerpt :

      In [2], a Generic Shape-Illumination Manifold (gSIM) framework was designed to hallucinate faces across different poses and scales. Later, an efficient framework for matching linear subspaces in images of different scales was designed [3]. More recently, Farrugia and Guillemot proposed a coupled sparse support (CSS) face-hallucination framework via estimating the local geometrical structure on the high-resolution manifold [8].

    • Reimagining the central challenge of face recognition: Turning a problem into an advantage

      2018, Pattern Recognition
      Citation Excerpt :

      I illustrate this idea with a few examples. If the variation within a set is modelled using a linear subspace and the subspace-to-subspace generalization of the distance from feature space (DFFS) [31] adopted as the (dis)similarity measure between them, the most similar modes of variation between two sets represented using such subspaces are sub-subspaces themselves [32]. These correspond to different exemplars fxy in Fig. 3 and can be compared using the DFFS baseline.

    • Recovering variations in facial albedo from low resolution images

      2018, Pattern Recognition
      Citation Excerpt :

      Liu et al. [18,19] proposed a two-step statistical modeling approach that integrates both a global parametric model and a local nonparametric model, and achieved very promising face hallucination results. Arandjelović [20] successfully reconstruct the personal subspace in the high-dimensional image space from a low-dimensional input without any assumptions on the nature of appearance that the subspaces represent. Recent studies [8–11,21–23] share a similar idea of using patch-based method to model the prior information of local structure of face images.

    • Estimating Phenotypic Characteristics of Tuberculosis Bacteria

      2023, Proceedings of the ACM Symposium on Applied Computing
    View all citing articles on Scopus

    Ognjen Arandjelović graduated top of his class from the Department of Engineering Science at the University of Oxford (M.E.). In 2007 he was awarded the Ph.D. degree from the University of Cambridge. After spending 4 years as a Fellow of Trinity College Cambridge, he moved to Swansea University as a Lecturer in Visual Computing. Currently he is a Senior Lecturer in Pattern Recognition and Data Analytics at Deakin University; he also holds the title of an Associated Professor at Université Laval. His main research interests are computer vision and machine learning, and their applications in various fields of science. He is a Fellow of the Cambridge Overseas Trust and a winner of multiple best research paper awards.

    View full text