Integrating joint feature selection into subspace learning: A formulation of 2DPCA for outliers robust feature selection
Introduction
With the recent advancement in data acquisition devices, acquiring data at faster rates and increased resolution has improved substantially over recent years. The data interpretation process, however, is facing several challenges due to high dimensionality. Not only for the classification, dimensionality reduction is also a serious challenge for several other domains such as data visualization, data compression, pattern recognition, and computer vision. The aim of dimensionality reduction is to transform the high-dimensional data into low-dimensional representation by preserving the quality of the data so that it could be classified efficiently. To deal with this issue, several vector-based methods are in use during the last two decades such as Principal Component Analysis (PCA) (Turk & Pentland, 1991), Linear Discriminant Analysis (LDA) (Belhumeur et al., 1997, Razzak et al., 2010, Ye et al., 2018), LPP (He & Niyogi, 2004), SPP (Qiao, Chen, & Tan, 2010), SPPE (Zhang, Yan, & Zhao, 2013), Isomap (Zhang et al., 2018) and NPE (He & Niyogi, 2004). Principal Component Analysis is one of the extensively used unsupervised dimensionality reduction method that projects high-dimensional representation into linear orthogonal space. However, one of the major drawbacks is that PCA is linear combination and loading are non-zero. This makes PCA data interpretation difficult, and it is still sensitive to outliers (as its covariance matrix is derived from -norm that affects its performance. Thus, it fails to deal with outliers that often appears in real-world data. Moreover, before applying PCA and LDA, there is need to convert the image into one-dimensional vector, thus it may not exploit image’s spatial structural information very well (Feng et al., 2013, He and Niyogi, 2004, Netrapalli et al., 2014, Turk and Pentland, 1991, Vaswani et al., 2018, Xu et al., 2010, Yi et al., 2017, Zou et al., 2006) which is very important for image representation. To overcome these issues, several variants of PCA have been proposed to improve the effectiveness of dimensionality reduction and robustness against outliers.
Matrix-based subspace learning methods have been widely applied for dimensionality reduction (Li et al., 2017, Li et al., 2010, Tian et al., 2017, Yang et al., 2004, Yang et al., 2005). Results showed that 2DPCA (Yang et al., 2004), 2DLDA (Yang et al., 2005), multi-linear PCA (Lu, Plataniotis, & Venetsanopoulos, 2008), and JGSPCA (Khan, Shafait, & Mian, 2015) are far more efficient as compared to one-dimensional subspace learning, due to its direct formulation based on two-dimensional images. Two-dimensional subspace learning methods directly calculate the class scatter metrics from images, hence can reveal the spatial structural information of image that is quite important for image classification task. To select important features, several efforts have been made such as robust 2DPCA, utilization of nuclear norm, , , and Frobenius-norm that showed considerable improvement against outliers and able to select discriminant patterns.
Recently -norm-based subspace learning methods haveshown great performance against outliers for tensor data classification (Razzak et al., 0000, Wang et al., 2012, Wang and Wang, 2013). Ke and Kanade presented matrix factorization as an -norm minimization problem that is able to handle missing data straightforwardly. Wang et al. presented robust 2DPCA with non-greedy -norm maximization in which all projection directions are optimized simultaneously (Wang & Gao, 2016). Luo et al. extended it by learning the optimization matrix by maximizing the sum of the projected difference between each pair of instances, rather than the difference between each instance and the mean of the data (Luo et al., 2017). Although, -based methods provided great performance, these methods do not relate to covariance matrix which characterizes the geometric structure of the data, where as F-norm can exploit efficiently the spatial structure that is embedded in the data. Several efforts have been made to utilize F-norm as subspace learning such as 2DPCA (Yang et al., 2004, Yang et al., 2005), 2D-PCA (Tian et al., 2017), F-norm 2DPCA (Li et al., 2017), NM-2DPCA (Chen et al., 2018, Wang et al., 2017), N-2DNPP (Zhang, Li, Zhao, Zhang, & Yan, 2017). However, either these methods still suffer from the effect of outliers or not able to select important features. Furthermore, sensitivity of F-norm is another challenge. Wang et al. presented non-squared F-norm minimization to overcome this challenge (Wang et al., 2017). However, it affects the selection of important features.
To overcome the aforementioned issue of robust feature selection and sensitivity of Frobenius norm, in this paper, we present a novel formulation for PCA that combines the subspace learning and feature selection together in order to exclude the effect of redundant patterns and joint feature selection. We employed Frobenius norm as distance metric learning and seek the projection matrix by joint minimization of regularizer and penalty terms. We relax the orthogonality constraints of transformation matrix and introduce another transformation that helps to jointly select important features and enhances the robustness against outliers. To overcome the sensitivity issue due to squared Frobenius norm, we devised an efficient way to compute F-Norm. As a result, the proposed objective function not only weakens the effect of large distance but also has rotational invariance property. We can describe the theoretical and empirical key contributions of this work as follows:
- •
We present outliers robust two-dimensional principal component analysis by efficiently integrating the robustness of traditional 2DPCA and the regularization term that relaxes the orthogonal constraint.
- •
The regularization term reduces the constraints and enables the objective function to select features jointly. Furthermore, the regularization parameter is convex and can be easily optimized.
- •
To overcome the sensitivity issue of F-Norm against outliers, we efficiently derived the objective function.
- •
Penalty term penalizes all regression coefficientscorresponding to single feature as a whole to make PCA possible to select features jointly. Hence, ORPCA approximates high-dimensional representation in flexible manner. As such, ORPCA has more freedom to select low-dimensional features efficiently.
- •
The one major drawback of F-norm is its sensitivity against outliers as outlying measurement arbitrarily skew the solution from desired due to squared objective function. As a result, F-norm is not able to utilize the underlying geometric structure in a real sense. To cope the sensitivity due to squared F-norm, recently, non-square F-norm have been used.
- •
The latter method is evaluated empirically on four benchmark datasets. Experimental evaluation (discriminant features, computationally and convergence analysis) shows the considerable improvement in most cases, while time complexity remains very attractive.
The rest of the paper is organized as follows. In Section 2, we present basic notations and related work. In Section 3, we present the motivation followed by the proposed objective function and its optimization. In Section 5, we provide detailed experimental evaluations. Finally, conclusion is drawn in Section 6.
Section snippets
Related work
Recently, subspace-learning techniques have shown their great performance and have been widely applied for high-dimensional data representation and classification. In the recent few years, researchers proposed number of methods to reduce the effect of outliers, and several variants have been presented in literature. PCA is one of the most widely used dimension-reduction approach. Unlike traditional PCA, two-dimensional PCA is based on two-dimensional image matrices rather than one dimensional
Motivation
As the aforementioned analysis in Sections 1 Introduction, 2 Related work, for the classification of high-dimensional noisy data, it is always important to find salient features that belong to specific part of image. Since the outlier does not have a precise mathematical meaning, thus the problem of RPCA problem is not well defined yet. Selection of important information by ignoring the redundant could help to improve the feature selection. However, most of the PCA-based methods are sensitive
Outliers robust 2DPCA
In this section, we present outliers robust dimensionality reduction approach (ORPCA) in detail. As described in earlier sections, the projection procedure consists of all the original features, thus, it may also have irrelevant and redundant features which could influence the performance of dimensionality reduction, in result affecting the classification performance. Furthermore, outliers strongly affect the feature selection which depresses the classification performance. In this work, we
Experimental results
In order to evaluate the performance of the proposed ORPCA, in this section, we have discussed and compared the performance of proposed ORPCA on four commonly used image datasets including AR (Martínez & Kak, 2001), Yale B (Sim, Baker, & Bsat, 2002), ORL and CMU PIE. We have used k-nearest neighbor (where ) for classification. The main contribution of this work is introducing joint feature selection in order to select useful features by effectively combining the robustness of traditional
Discussion
We notice that methods based on matrix perform better as compared to vector-based methods. Results show that proposed ORPCA finds the representative features from high-dimensional space that are used for classification. Unlike 2DPCA based on -norm, ORPCA has rotational invariance property and has the freedom to jointly select the important and contributive features such as nose, eyes, lips in case of face image, while contours of different objects in non-facial datasets. Traditional methods
Conclusion
In this paper, we presented a robust dimensionality reduction method that by relaxing the orthogonal constraints of the transformation matrix and imposing a penalty function on regularization term. In contrast to previous work on robustness in PCA, we jointly select the important features. Introduction of penalty function results in the robustness against outliers by reducing their impact in projection matrix. Compared with state-of-the-art methods, our evaluation results show the improvement
Acknowledgment
This work is partially supported by Australian Research Council Linkage Projects under LP170100891.
References (42)
- et al.
F-norm distance metric based robust 2dpca and face recognition
Neural Networks
(2017) - et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognition
(2010) - et al.
Optimal mean two-dimensional principal component analysis with f-norm minimization
Pattern Recognition
(2017) - et al.
2dpca with l1-norm for simultaneously robust and sparse modelling
Neural Networks
(2013) - et al.
Robust auto-weighted projective low-rank and sparse recovery for visual representation
Neural Networks
(2019) - et al.
Two-dimensional discriminant transform for face recognition
Pattern Recognition
(2005) - et al.
Lp-and ls-norm distance based robust linear discriminant analysis
Neural Networks
(2018) - et al.
Joint sparse principal component analysis
Pattern Recognition
(2017) - et al.
Semi-supervised local multi-manifold isomap by linear embedding for feature extraction
Pattern Recognition
(2018) - et al.
Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction
Information Sciences
(2015)
Nuclear norm based two-dimensional sparse principal component analysis
International Journal of Wavelets, Multiresolution and Information Processing
Joint group sparse pca for compressed hyperspectral imaging
IEEE Transactions on Image Processing
L1-norm-based 2dpca
IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics)
Mpca: Multilinear principal component analysis of tensor objects
IEEE Transactions on Neural Networks
Avoiding optimal mean 2, 1-norm maximization-based robust pca for reconstruction
Neural Computation
The AR face database
CVC Technical Report
Pca versus lda
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (22)
Double information preserving canonical correlation analysis
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :These data are called multi-view data (Li et al., 2019b; Salim et al., 2020; Zhao et al., 2017; Sun, 2013; Tang et al., 2018). However, a common situation is that the samples often appear in high dimensional form (Li et al., 2019a; Razzak et al., 2020). In order to avoid “curse of dimensionality” caused by high dimensional samples, researchers have developed a diversity of feature extraction methods to find the low dimensional representation of the original samples without loss of information (Chen et al., 2018; Xia et al., 2010; Zhang et al., 2021a, 2020b, 2017).
A new deep technique using R-CNN model and L1NSR feature selection for brain MRI classification
2022, Biomedical Signal Processing and ControlFusion of CNN and sparse representation for threat estimation near power lines and poles infrastructure using aerial stereo imagery
2021, Technological Forecasting and Social ChangeDiscovering dynamic adverse behavior of policyholders in the life insurance industry
2021, Technological Forecasting and Social ChangeCitation Excerpt :As a result, it is not possible to identify real AS users, and many honest policyholders may suffer. It is worth mentioning that the AS detection method developed in this paper is different from the outlier and anomaly detection methods used in other applications (Li et al., 2019; 2012; Li and Wang, 2017; Razzak et al., 2020a; 2020b; Singh and Vardhan, 2019; Tewari and Gupta, 2020; Wang et al., 2019; Yin et al., 2020; 2018) since no explicit labels are provided; instead, we leverage the rule learning technique, which has a long history but still shines in recent works (Zhou et al., 2019). From the above review of the existing studies, it is clear that most of the methods focus on limited aspects and were limited in their performance and capability.
Randomized nonlinear one-class support vector machines with bounded loss function to detect of outliers for large scale IoT data
2020, Future Generation Computer SystemsCitation Excerpt :Unsupervised anomaly detection methods detect the anomalies in an unlabeled data under the assumption that the majority of the instances are normal and small fraction of data show anomalous behavior. One class support vector machines (OCSVM) has proven itself to be one the effective classifier for unsupervised anomaly detection [4,5]. However, it is still sensitive to outliers and computationally complex for large datasets.
A unified robust framework for multi-view feature extraction with L2,1-norm constraint
2020, Neural NetworksCitation Excerpt :MVL has been demonstrated the effectiveness in many applications, such as classification, clustering and feature selection and has attracted more and more attentions (Huang, Chung, & Wang, 2016; Tang et al., 2018; Xie, Gao, Wang, Zhang, & Gao, 2020; Zong, Zhang, & Liu, 2018). With the development of computer vision technique, the samples have often appeared in a high dimensional form (Han et al., 2018; Li, Li, Gao, & Xie, 2017; Razzak, Saris, Blumenstein, & Xu, 2020; Zhu, Xu, Shen, & Zhao, 2017). In order to avoid curse of dimensionality caused by high dimensional samples, researchers have developed a diversity of feature extraction methods.