Saliency detection using hierarchical manifold learning
Introduction
Object detection is an important but difficult issue in computer vision, especially in crowded environment. Inspired by the mechanism of human visual attention which has been attracting a great deal of interest from researchers in the field of psychology, neurobiology [1], and computer science [2], saliency detection serves as a filter to efficiently distinguish regions of interest (ROI) in images while ignoring irrelative backgrounds. Saliency detection models are expected to be robust to various challenges in real-world scenes, including partial occlusion and nearby clutters, noises and changes in illumination, rotation, scale and common object variations. Saliency detection methods benefit many fields, such as object detection [3], automatic image segmentation [4], and visual concept collection [5].
With neurobiological or psychological purpose, computational saliency detection models emerge to simulate the bottom-up human visual attention to predict which parts are possible fixated locations in a scene [6]. The term saliency maps introduced by Itti et al. [7] to depict the visual attention is also used to record salient points in an image by the saliency detection models. Recently, more and more saliency detection models segment salient objects from the background [8], [9], [10]. Lots of works have been done to obtain optimal saliency maps evaluated by benchmark datasets, such as AIM (eye tracking) [11] and MSRA dataset (object mask) [12]. As shown in Fig. 1, if a salient object is located in the clear background, several state-of-the-art saliency detection methods can produce ideal saliency maps. However, when a salient object is in a crowded environment, the background may mislead the saliency map to hit a location which is different from the ground truth.
Generally speaking, the ground truth (obtained from both eye tracking and object mask) is a measure of human visual attention. It is reasonable to develop saliency detection models following the psychological evidences of human vision system. For example, Goferman et al. proposed to detect salient regions with four steps which are in accordance with four principles of human visual attention [15] including selecting distinctive colour or patterns, suppressing frequently occurring features, grouping salient pixels and post-processing saliency maps according to prior knowledge. With increasing interests in saliency detection, a similar category of methods are reported within the paradigm of the spatial domain [8], [9]. In these methods, algorithms based on mid-level cues (e.g., superpixels) and integration are explored to increase the accuracy of saliency maps.
In the spatial domain, salient regions are the unique parts differing from the background. More precisely, salient objects are rare as compared to the background in an image, both locally and globally [18]. From the perspective of the frequency domain, uniqueness means statistical singularities in the spectrum. Hou et al. proposed the image signature to detect salient parts in the frequency domain [19]. The image signature calculates the Distance Cosine Transform (DCT) followed by the sign function to locate the sparse foregrounds and suppress nonsparse backgrounds in images. The saliency map is calculated from the Inverse Distance Cosine Transform (IDCT) of the image signature in the spatial domain. The authors also claim that the image signature precedes the spectral residual (SR) approach [20] to generate more compact saliency maps. By analysing spikes of signals in the frequency domain, Guo et al. argued that popped out proto-objects can be reconstructed from the phase spectrum without the amplitude spectrum through Phase Spectrum of Fourier Transform (PFT) [21]. In fact, both the image signature and PFT maintain the phase spectrum combined with the Gaussian post-processing in an image to suppress the lower frequencies and enhance the higher frequencies, as the amplitude spectrum of natural images obeys a distribution called 1/f law [16]. However, these methods in the frequency domain are not suitable for detecting variant salient objects from cluttered backgrounds for three reasons: (1) the image signature and PFT suppress or enhance features, such as texture and boundary, which are not sufficient enough to highlight salient objects. (2) There are more irregularly repeated patterns in backgrounds, which are difficult to be suppressed in the frequency domain. (3) Some important priors for human visual attention in the spatial domain, such as colour, are omitted.
In this paper, we aim to develop a robust approach to regularize the repeated patterns in cluttered backgrounds and maintain the spatial priors for salient objects in images through exploring saliency detection methods from four perspectives, including hierarchical representation of images, making use of colour cues, rare shapes from multi-scale contours and saliency integration by manifold learning. The main contribution of this paper is threefold. Firstly, we extracted a more representative, self-selective and low computation cost features by transforming an image into a hierarchical sequence of images according to colour distance transformation and self-adaptive selection of entropy. Secondly, we developed a novel concept of rare shapes from multi-scale contours to capture salient seeds, which yield a better performance than the spatial priori of background seeds. Thirdly, an improved manifold learning method is used to ensure the locality and compactness of saliency maps from a hierarchical sequence of images.
The remainder of the paper is organized as follows. Section 2 briefly presents the related works on feature representation, mid-level cues, manifold learning and saliency integration in computer vision. Section 3 provides details of the proposed method. Experimental setups and results are analysed in Section 4. Finally, Section 5 concludes the paper.
Section snippets
Related works
In saliency detection methods, representative features are beneficial for low-level salient parts in images while the estimation of mid-level cues provide larger scale homogeneous information to obtain better saliency maps. Both low-level and mid-level cues can be organized in a graph-based framework to integrate saliency maps with locality and compactness.
The proposed method
This paper aims to derive a bottom-up saliency detection model for colour images with complex backgrounds. The hierarchical sequence of images S is constructed to provide more distinctive cues which indicate homogeneous regions in cluttered environment. Then rare shapes are estimated as salient seeds for the improved manifold learning method to integrate final saliency maps based on S.
Experimental results and analysis
We evaluate the effectiveness of the proposed algorithms on two datasets which provide both images and binary ground truth, and exhaustively compare our method with state-of-the-art saliency detection algorithms in terms of precision, recall, F-measure and mean absolute error (MAE) as defined in [9].
Conclusion and future work
Different from methods analysing saliency detection in the frequency domain, the hierarchical sequence of images reflect 1/f law through sorting the sequence according to entropies, because the sub-image with lower entropy contains less redundant information than others. As the energy distribution of salient objects is unknown, we uniformly assign energy into each sub-image. It is worth exploring other distributions to make sub-images more compact, which is promising to benefit saliency maps.
Youhai Qiu received his B.Sc. and M.Sc. degrees from Huazhong University of Science and Technology, China. From 2011, he has been pursuing the Ph.D. degree at Deakin University, Australia. His major research interests are computer vision and intelligent system, including object detection and image semantic analysis.
References (37)
- et al.
Automatic salient object extraction with contextual cue and its applications to recognition and alpha matting
Pattern Recognit.
(2013) - et al.
Attention links sensing to recognition
Image Vis. Comput.
(2008) - et al.
Bayesian surprise attracts human attention
Vis. Res.
(2009) - et al.
When size mattersattention affects performance by contrast or response gain
Nat. Neurosci.
(2010) - T. Serre, L. Wolf, T. Poggio, Object recognition with features inspired by visual cortex, in: IEEE Computer Society...
- C. Scharfenberger, A. Wong, K. Fergani, J.S. Zelek, D.A. Clausi, Statistical textural distinctiveness for salient...
- Q. Li, J. Wu, Z. Tu, Harvesting mid-level visual concepts from large-scale internet images, in: 2013 IEEE Conference on...
- et al.
Computational modelling of visual attention
Nat. Rev. Neurosci.
(2001) - et al.
Learning to detect a salient object
IEEE Trans. Pattern Anal. Mach. Intell.
(2011) - F. Perazzi, P. Krahenbuhl, Y. Pritch, A. Hornung, Saliency filters: contrast based filtering for salient region...
Context-aware saliency detection
IEEE Trans. Pattern Anal. Mach. Intell.
Visual saliency based on scale-space analysis in the frequency domain
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (11)
Saliency detection via a multi-layer graph based diffusion model
2018, NeurocomputingCitation Excerpt :Thus, saliency detection techniques can be widely used in many computer vision and image processing tasks such as image segmentation [3–5], object detection [6–8], image retrieval [9] and visual object tracking [10]. Recently, many models and methods have been proposed for image saliency detection problem [11–19]. Generally, these methods can be roughly categorized into three classes, i.e., bottom-up, top-down and combination of top-down and bottom-up methods [11,12,15,16,20].
Hybrid of extended locality-constrained linear coding and manifold ranking for salient object detection
2018, Journal of Visual Communication and Image RepresentationLearning visual saliency from human fixations for stereoscopic images
2017, NeurocomputingCitation Excerpt :In [6], Fang et al. proposed a visual attention model in compressed domain and demonstrate its application in image retargeting. Recently, machine learning has been used in many studies to build visual attention models [22,44,45,67]. In [22], Judd et al. constructed an eye fixation database and proposed a learning based visual attention model by features obtained from the eye tracking database.
Optimizing multi-graph learning based salient object detection
2017, Signal Processing: Image CommunicationCitation Excerpt :Fu et al. [27] utilized the normalized graph cut to capture color and edge information of image to cluster visual contents. Similar to Yang [11], Qiu et al. [31] used rare shapes from mid-level cues as salient seeds in a graph-based manifold learning framework to generate the saliency map. However, these methods cannot always obtain salient objects in crowded and cluttered backgrounds because the queries of these graphs based methods are not efficient enough to indicate the salient objects.
Diffusion-based saliency detection with optimal seed selection scheme
2017, NeurocomputingCitation Excerpt :Such methods treat the superpixels as nodes and connect them with edges, and then the saliency score on the nodes will spread along the edges according to a diffusion procedure. To obtain a high quality saliency map, Yang et al. [4] approached the saliency detection problem in a graph-based manner, in which the label of the seeds propagates along a sparsely connected graph. [5] explored a saliency detection method from four perspectives,including hierarchical image representation, color cues, rare shapes from multi-scale contours, and saliency integration by manifold learning.
Youhai Qiu received his B.Sc. and M.Sc. degrees from Huazhong University of Science and Technology, China. From 2011, he has been pursuing the Ph.D. degree at Deakin University, Australia. His major research interests are computer vision and intelligent system, including object detection and image semantic analysis.
Xiangping Sun got his master degree (CS) in Pattern Recognition and Artificial Intelligence from Huazhong University of Science and Technology, China, 2009. From 2010, he has been studying in the University of Deakin, Australia. Generally, his interest includes computer vision for texture image analysis, intelligent systems.
Mary Fenghua She received her B.Sc. and M.Sc. degrees in Engineering from Donghua University, Shanghai, China and Ph.D. from Deakin University, Victoria, Australia. After graduation, she was awarded Australian Post-doctorial Fellowship by Australian Research Council in 2002 and worked on image analysis and artificial intelligence technologies for materials characterization and animal monitoring in University of South Australia for 4.5 years. She currently holds a position of research fellow in the Institute of Technology and Research Innovation at Deakin University. Her major research interest includes image processing and analysis, pattern recognition, artificial intelligence and intelligent wearable systems.