Saliency detection using hierarchical manifold learning

doi:10.1016/j.neucom.2015.05.073

Neurocomputing

Volume 168, 30 November 2015, Pages 538-549

https://doi.org/10.1016/j.neucom.2015.05.073 Get rights and content

Abstract

Saliency detection is critical to many applications in computer vision by eliminating redundant backgrounds. The saliency detection approaches can be divided into two categories, i.e., top-down and bottom-up. Among them, bottom-up models have attracted more attention due to their simple mechanisms. However, many existing bottom-up models are not robust to crowded backgrounds because of missing salient regions within feedforward frameworks which is often not effective for complex scenes. We tackle these problems by modifying and extending a bottom-up saliency detection model through three phases, (1) constructing a hierarchical sequence of images from the perspective of entropy, (2) estimated mid-level cues are used as feedback information, (3) subsequently generating saliency maps by global context and local uniqueness in a graph-based framework. We also compare the proposed bottom-up model with state-of-the-art approaches on two benchmark datasets to evaluate its saliency detection performance. The experimental results demonstrate that the proposed bottom-up saliency detection approach is not only robust to both cluttered and clean scenes, but also able to obtain objects with different scales.

Introduction

Object detection is an important but difficult issue in computer vision, especially in crowded environment. Inspired by the mechanism of human visual attention which has been attracting a great deal of interest from researchers in the field of psychology, neurobiology [1], and computer science [2], saliency detection serves as a filter to efficiently distinguish regions of interest (ROI) in images while ignoring irrelative backgrounds. Saliency detection models are expected to be robust to various challenges in real-world scenes, including partial occlusion and nearby clutters, noises and changes in illumination, rotation, scale and common object variations. Saliency detection methods benefit many fields, such as object detection [3], automatic image segmentation [4], and visual concept collection [5].

With neurobiological or psychological purpose, computational saliency detection models emerge to simulate the bottom-up human visual attention to predict which parts are possible fixated locations in a scene [6]. The term saliency maps introduced by Itti et al. [7] to depict the visual attention is also used to record salient points in an image by the saliency detection models. Recently, more and more saliency detection models segment salient objects from the background [8], [9], [10]. Lots of works have been done to obtain optimal saliency maps evaluated by benchmark datasets, such as AIM (eye tracking) [11] and MSRA dataset (object mask) [12]. As shown in Fig. 1, if a salient object is located in the clear background, several state-of-the-art saliency detection methods can produce ideal saliency maps. However, when a salient object is in a crowded environment, the background may mislead the saliency map to hit a location which is different from the ground truth.

Generally speaking, the ground truth (obtained from both eye tracking and object mask) is a measure of human visual attention. It is reasonable to develop saliency detection models following the psychological evidences of human vision system. For example, Goferman et al. proposed to detect salient regions with four steps which are in accordance with four principles of human visual attention [15] including selecting distinctive colour or patterns, suppressing frequently occurring features, grouping salient pixels and post-processing saliency maps according to prior knowledge. With increasing interests in saliency detection, a similar category of methods are reported within the paradigm of the spatial domain [8], [9]. In these methods, algorithms based on mid-level cues (e.g., superpixels) and integration are explored to increase the accuracy of saliency maps.

In the spatial domain, salient regions are the unique parts differing from the background. More precisely, salient objects are rare as compared to the background in an image, both locally and globally [18]. From the perspective of the frequency domain, uniqueness means statistical singularities in the spectrum. Hou et al. proposed the image signature to detect salient parts in the frequency domain [19]. The image signature calculates the Distance Cosine Transform (DCT) followed by the sign function to locate the sparse foregrounds and suppress nonsparse backgrounds in images. The saliency map is calculated from the Inverse Distance Cosine Transform (IDCT) of the image signature in the spatial domain. The authors also claim that the image signature precedes the spectral residual (SR) approach [20] to generate more compact saliency maps. By analysing spikes of signals in the frequency domain, Guo et al. argued that popped out proto-objects can be reconstructed from the phase spectrum without the amplitude spectrum through Phase Spectrum of Fourier Transform (PFT) [21]. In fact, both the image signature and PFT maintain the phase spectrum combined with the Gaussian post-processing in an image to suppress the lower frequencies and enhance the higher frequencies, as the amplitude spectrum of natural images obeys a distribution called 1/f law [16]. However, these methods in the frequency domain are not suitable for detecting variant salient objects from cluttered backgrounds for three reasons: (1) the image signature and PFT suppress or enhance features, such as texture and boundary, which are not sufficient enough to highlight salient objects. (2) There are more irregularly repeated patterns in backgrounds, which are difficult to be suppressed in the frequency domain. (3) Some important priors for human visual attention in the spatial domain, such as colour, are omitted.

In this paper, we aim to develop a robust approach to regularize the repeated patterns in cluttered backgrounds and maintain the spatial priors for salient objects in images through exploring saliency detection methods from four perspectives, including hierarchical representation of images, making use of colour cues, rare shapes from multi-scale contours and saliency integration by manifold learning. The main contribution of this paper is threefold. Firstly, we extracted a more representative, self-selective and low computation cost features by transforming an image into a hierarchical sequence of images according to colour distance transformation and self-adaptive selection of entropy. Secondly, we developed a novel concept of rare shapes from multi-scale contours to capture salient seeds, which yield a better performance than the spatial priori of background seeds. Thirdly, an improved manifold learning method is used to ensure the locality and compactness of saliency maps from a hierarchical sequence of images.

The remainder of the paper is organized as follows. Section 2 briefly presents the related works on feature representation, mid-level cues, manifold learning and saliency integration in computer vision. Section 3 provides details of the proposed method. Experimental setups and results are analysed in Section 4. Finally, Section 5 concludes the paper.

Section snippets

Related works

In saliency detection methods, representative features are beneficial for low-level salient parts in images while the estimation of mid-level cues provide larger scale homogeneous information to obtain better saliency maps. Both low-level and mid-level cues can be organized in a graph-based framework to integrate saliency maps with locality and compactness.

The proposed method

This paper aims to derive a bottom-up saliency detection model for colour images with complex backgrounds. The hierarchical sequence of images S is constructed to provide more distinctive cues which indicate homogeneous regions in cluttered environment. Then rare shapes are estimated as salient seeds for the improved manifold learning method to integrate final saliency maps based on S.

Experimental results and analysis

We evaluate the effectiveness of the proposed algorithms on two datasets which provide both images and binary ground truth, and exhaustively compare our method with state-of-the-art saliency detection algorithms in terms of precision, recall, F-measure and mean absolute error (MAE) as defined in [9].

Conclusion and future work

Different from methods analysing saliency detection in the frequency domain, the hierarchical sequence of images reflect 1/f law through sorting the sequence according to entropies, because the sub-image with lower entropy contains less redundant information than others. As the energy distribution of salient objects is unknown, we uniformly assign energy into each sub-image. It is worth exploring other distributions to make sub-images more compact, which is promising to benefit saliency maps.

Youhai Qiu received his B.Sc. and M.Sc. degrees from Huazhong University of Science and Technology, China. From 2011, he has been pursuing the Ph.D. degree at Deakin University, Australia. His major research interests are computer vision and intelligent system, including object detection and image semantic analysis.

References (37)

J. Xue et al.
Automatic salient object extraction with contextual cue and its applications to recognition and alpha matting
Pattern Recognit.
(2013)
A.L. Rothenstein et al.
Attention links sensing to recognition
Image Vis. Comput.
(2008)
L. Itti et al.
Bayesian surprise attracts human attention
Vis. Res.
(2009)
K. Herrmann et al.
When size mattersattention affects performance by contrast or response gain
Nat. Neurosci.
(2010)
T. Serre, L. Wolf, T. Poggio, Object recognition with features inspired by visual cortex, in: IEEE Computer Society...
C. Scharfenberger, A. Wong, K. Fergani, J.S. Zelek, D.A. Clausi, Statistical textural distinctiveness for salient...
Q. Li, J. Wu, Z. Tu, Harvesting mid-level visual concepts from large-scale internet images, in: 2013 IEEE Conference on...
L. Itti et al.
Computational modelling of visual attention
Nat. Rev. Neurosci.
(2001)
T. Liu et al.
Learning to detect a salient object
IEEE Trans. Pattern Anal. Mach. Intell.
(2011)
F. Perazzi, P. Krahenbuhl, Y. Pritch, A. Hornung, Saliency filters: contrast based filtering for salient region...

A. Borji, D.N. Sihite, L. Itti, Salient object detection: a benchmark, in: Computer Vision—ECCV 2012, Springer, 2012,...

N. Bruce, J. Tsotsos, Saliency based on information maximization, in: Advances in Neural Information Processing...

R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, in: IEEE Conference on...

J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: Advances in Neural Information Processing Systems, 2006,...

X. Shen, Y. Wu, A unified approach to salient object detection via low rank matrix recovery, in: 2012 IEEE Conference...

S. Goferman et al.

Context-aware saliency detection

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

J. Li et al.

Visual saliency based on scale-space analysis in the frequency domain

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

J. Zhang, S. Sclaroff, Saliency detection: a Boolean map approach, in: 2013 IEEE International Conference on Computer...

Cited by (11)

Saliency detection via a multi-layer graph based diffusion model
2018, Neurocomputing
Citation Excerpt :
Thus, saliency detection techniques can be widely used in many computer vision and image processing tasks such as image segmentation [3–5], object detection [6–8], image retrieval [9] and visual object tracking [10]. Recently, many models and methods have been proposed for image saliency detection problem [11–19]. Generally, these methods can be roughly categorized into three classes, i.e., bottom-up, top-down and combination of top-down and bottom-up methods [11,12,15,16,20].
Saliency detection is an important problem in computer vision area. In this paper, we propose a new multi-layer graph based diffusion (MLD) model for image saliency detection by adopting random walk with restart(RWR) model. Firstly, we compute background and foreground priors/cues, respectively for the input image on different scales. Then, we adopt the proposed diffusion model to obtain more reasonable and accurate background and foreground measurements. Finally, we combine both background and foreground measurements together to obtain a more accurate saliency estimation. One important aspect of the proposed multi-layer diffusion model is that it can conduct diffusion of saliency cues across different layers simultaneously and cooperatively and thus can share and communicate the saliency cues across different image scales. Experimental evaluations on four benchmark datasets demonstrate the benefits and effectiveness of the proposed method.
A unified framework for exploiting color coefficients for salient object detection
2018, Neurocomputing
With many applications in high level scene understanding, salient object detection is an important objective. In addition, this topic has received significant attention recently. This work proposes techniques that exploit the role of color spaces for addressing two important challenges in relation to salient object detection. The autonomous identification of a color space by which to carry out the processing has always been of prime importance to most image analysis tasks and this is equally so in relation to saliency detection. To address this challenge, a new adaptive color space selection method is proposed, here, which autonomously identifies the color space that is locally the most appropriate for saliency detection on an image by image basis. The color channels within this identified local color space are aggregated using joint l_{2, 1}-norm minimization in order to determine feature importance and also to achieve feature selection leaning. A process relevant to saliency detection is multi-modality feature fusion, in which multiple features and color spaces are used in combination to capture the saliency aspects of an image/region. To implement this second process, a new technique for the region based optimal combination of feature modalities is introduced. The results of the rigorous experimental evaluations demonstrate the effectiveness of the adaptive color space selection and also for the region based optimal combination methods in comparison to 13 other state-of-the-art saliency methods and all in relation to three benchmark datasets. The efficacy of the proposed color space selection and region based combination methods is further validated by examining their ability to select appropriate color spaces for, and successfully aggregate the results of, up to five benchmark methods.
Hybrid of extended locality-constrained linear coding and manifold ranking for salient object detection
2018, Journal of Visual Communication and Image Representation
Recent years have witnessed great progress of salient object detection methods. However, due to the emerging complex scenes, two problems should be solved urgently: one is on the fast locating of the foreground while preserving the precision, and the other is about reducing the noise near the foreground boundary in saliency maps. In this paper, a hybrid method is proposed to ameliorate the above two issues. At first, to reduce the essential runtime of integrating the prior knowledge, a novel Prior Knowledge Learning based Region Classification (PKL-RC) method is proposed for classifying image regions and preliminarily locating foreground; furthermore, to generate more accurate saliency, a Locality-constrained Linear self-Coding based Region Clustering (LLsC-RC) model is proposed to improve the adjacency structure of the similarity graph for Manifold Ranking (MR). Experimental results demonstrate the effectiveness and superiority of the proposed method in both higher precision and better smoothness.
Learning visual saliency from human fixations for stereoscopic images
2017, Neurocomputing
Citation Excerpt :
In [6], Fang et al. proposed a visual attention model in compressed domain and demonstrate its application in image retargeting. Recently, machine learning has been used in many studies to build visual attention models [22,44,45,67]. In [22], Judd et al. constructed an eye fixation database and proposed a learning based visual attention model by features obtained from the eye tracking database.
In the previous years, a lot of saliency detection algorithms have been designed for saliency computation of visual content. Recently, stereoscopic display techniques have developed rapidly, which results in much requirement of stereoscopic saliency detection for emerging stereoscopic applications. Different from 2D saliency prediction, stereoscopic saliency detection methods have to consider depth factor. We design a novel stereoscopic saliency detection algorithm by machine learning technique. First, the features of luminance, color and texture are extracted to calculate the feature contract for predicting feature maps of stereoscopic images. Furthermore, the depth features are extracted for depth feature map computation. Sematic features including the center-bias factor and other top-down cues are also applied as the features in the proposed stereoscopic saliency detection method. Support Vector Regression (SVR) is applied to learn the saliency detection model of stereoscopic images. Experimental results obtained on a public large-scale eye tracking database demonstrate that the proposed method can predict better saliency results for stereoscopic images than other existing ones.
Optimizing multi-graph learning based salient object detection
2017, Signal Processing: Image Communication
Citation Excerpt :
Fu et al. [27] utilized the normalized graph cut to capture color and edge information of image to cluster visual contents. Similar to Yang [11], Qiu et al. [31] used rare shapes from mid-level cues as salient seeds in a graph-based manifold learning framework to generate the saliency map. However, these methods cannot always obtain salient objects in crowded and cluttered backgrounds because the queries of these graphs based methods are not efficient enough to indicate the salient objects.
In this paper, we propose a novel bottom-up saliency detection algorithm to effectively detect salient objects. Different from most existing methods that are not robust to complex scenes, we utilize multi-graph learning to take various scenes into consideration. First, multiple features are used to represent superpixels, and then measured by multiple distance metrics to construct multiple graphs. The motivation is to take advantage of their complementary information to cope with different environments. Second, fixation and boundary cues are respectively used as foreground and background seeds. The fixation is effective for crowded backgrounds because of the observation that regions within eye fixations are very likely the image foreground. Third, we integrate the multiple graphs and seeds into a regularized and optimized multi-graph based learning framework to effectively generate foreground-based and background-based saliency maps. Finally, we integrate these two saliency maps to obtain a more smooth and accurate saliency map. Extensive experiments are conducted on five benchmark datasets. Experimental results show that the proposed bottom-up saliency detection method yields comparable or better results against the state-of-the-art methods, and is robust to both cluttered and clean scenes.
Diffusion-based saliency detection with optimal seed selection scheme
2017, Neurocomputing
Citation Excerpt :
Such methods treat the superpixels as nodes and connect them with edges, and then the saliency score on the nodes will spread along the edges according to a diffusion procedure. To obtain a high quality saliency map, Yang et al. [4] approached the saliency detection problem in a graph-based manner, in which the label of the seeds propagates along a sparsely connected graph. [5] explored a saliency detection method from four perspectives,including hierarchical image representation, color cues, rare shapes from multi-scale contours, and saliency integration by manifold learning.
To detect salient regions in images, a widely accepted practice is to construct a graph on the image elements, and then assign a saliency value to each node in the graph according to its distance to a number of initial seeds. Two problems emerge in this procedure, i.e., generating the initial seeds and propagating the saliency values. In this work, a scheme for selecting the initial seeds is introduced. A linear model is learned to predict the confidence of assigning a superpixel to the foreground or to the background, and then an adaptive thresholding method is adopted to generate reliable foreground and background seeds, from which the saliency value is propagated in the diffusion procedure. The proposed approach is experimentally evaluated on several saliency detection datasets, and improved results are observed compared with a number of the state of the art methods.

View all citing articles on Scopus

Xiangping Sun got his master degree (CS) in Pattern Recognition and Artificial Intelligence from Huazhong University of Science and Technology, China, 2009. From 2010, he has been studying in the University of Deakin, Australia. Generally, his interest includes computer vision for texture image analysis, intelligent systems.

Mary Fenghua She received her B.Sc. and M.Sc. degrees in Engineering from Donghua University, Shanghai, China and Ph.D. from Deakin University, Victoria, Australia. After graduation, she was awarded Australian Post-doctorial Fellowship by Australian Research Council in 2002 and worked on image analysis and artificial intelligence technologies for materials characterization and animal monitoring in University of South Australia for 4.5 years. She currently holds a position of research fellow in the Institute of Technology and Research Innovation at Deakin University. Her major research interest includes image processing and analysis, pattern recognition, artificial intelligence and intelligent wearable systems.

View full text

Saliency detection using hierarchical manifold learning

Abstract

Introduction

Section snippets

Related works

The proposed method

Experimental results and analysis

Conclusion and future work

Pattern Recognit.

Image Vis. Comput.

Vis. Res.

When size mattersattention affects performance by contrast or response gain

Nat. Neurosci.

Computational modelling of visual attention

Nat. Rev. Neurosci.

Learning to detect a salient object

IEEE Trans. Pattern Anal. Mach. Intell.

Context-aware saliency detection

IEEE Trans. Pattern Anal. Mach. Intell.

Visual saliency based on scale-space analysis in the frequency domain

IEEE Trans. Pattern Anal. Mach. Intell.