Elsevier

Neural Networks

Volume 121, January 2020, Pages 441-451
Neural Networks

Integrating joint feature selection into subspace learning: A formulation of 2DPCA for outliers robust feature selection

https://doi.org/10.1016/j.neunet.2019.08.030Get rights and content

Abstract

Since the principal component analysis and its variants are sensitive to outliers that affect their performance and applicability in real world, several variants have been proposed to improve the robustness. However, most of the existing methods are still sensitive to outliers and are unable to select useful features. To overcome the issue of sensitivity of PCA against outliers, in this paper, we introduce two-dimensional outliers-robust principal component analysis (ORPCA) by imposing the joint constraints on the objective function. ORPCA relaxes the orthogonal constraints and penalizes the regression coefficient, thus, it selects important features and ignores the same features that exist in other principal components. It is commonly known that square Frobenius norm is sensitive to outliers. To overcome this issue, we have devised an alternative way to derive objective function. Experimental results on four publicly available benchmark datasets show the effectiveness of joint feature selection and provide better performance as compared to state-of-the-art dimensionality-reduction methods.

Introduction

With the recent advancement in data acquisition devices, acquiring data at faster rates and increased resolution has improved substantially over recent years. The data interpretation process, however, is facing several challenges due to high dimensionality. Not only for the classification, dimensionality reduction is also a serious challenge for several other domains such as data visualization, data compression, pattern recognition, and computer vision. The aim of dimensionality reduction is to transform the high-dimensional data into low-dimensional representation by preserving the quality of the data so that it could be classified efficiently. To deal with this issue, several vector-based methods are in use during the last two decades such as Principal Component Analysis (PCA) (Turk & Pentland, 1991), Linear Discriminant Analysis (LDA) (Belhumeur et al., 1997, Razzak et al., 2010, Ye et al., 2018), LPP (He & Niyogi, 2004), SPP (Qiao, Chen, & Tan, 2010), SPPE (Zhang, Yan, & Zhao, 2013), Isomap (Zhang et al., 2018) and NPE (He & Niyogi, 2004). Principal Component Analysis is one of the extensively used unsupervised dimensionality reduction method that projects high-dimensional representation into linear orthogonal space. However, one of the major drawbacks is that PCA is linear combination and loading are non-zero. This makes PCA data interpretation difficult, and it is still sensitive to outliers (as its covariance matrix is derived from 2-norm that affects its performance. Thus, it fails to deal with outliers that often appears in real-world data. Moreover, before applying PCA and LDA, there is need to convert the image into one-dimensional vector, thus it may not exploit image’s spatial structural information very well (Feng et al., 2013, He and Niyogi, 2004, Netrapalli et al., 2014, Turk and Pentland, 1991, Vaswani et al., 2018, Xu et al., 2010, Yi et al., 2017, Zou et al., 2006) which is very important for image representation. To overcome these issues, several variants of PCA have been proposed to improve the effectiveness of dimensionality reduction and robustness against outliers.

Matrix-based subspace learning methods have been widely applied for dimensionality reduction (Li et al., 2017, Li et al., 2010, Tian et al., 2017, Yang et al., 2004, Yang et al., 2005). Results showed that 2DPCA (Yang et al., 2004), 2DLDA (Yang et al., 2005), multi-linear PCA (Lu, Plataniotis, & Venetsanopoulos, 2008), and JGSPCA (Khan, Shafait, & Mian, 2015) are far more efficient as compared to one-dimensional subspace learning, due to its direct formulation based on two-dimensional images. Two-dimensional subspace learning methods directly calculate the class scatter metrics from images, hence can reveal the spatial structural information of image that is quite important for image classification task. To select important features, several efforts have been made such as robust 2DPCA, utilization of nuclear norm, 1, 2,1, and Frobenius-norm that showed considerable improvement against outliers and able to select discriminant patterns.

Recently 1-norm-based subspace learning methods haveshown great performance against outliers for tensor data classification (Razzak et al., 0000, Wang et al., 2012, Wang and Wang, 2013). Ke and Kanade presented matrix factorization as an 1-norm minimization problem that is able to handle missing data straightforwardly. Wang et al. presented robust 2DPCA with non-greedy 1-norm maximization in which all projection directions are optimized simultaneously (Wang & Gao, 2016). Luo et al. extended it by learning the optimization matrix by maximizing the sum of the projected difference between each pair of instances, rather than the difference between each instance and the mean of the data (Luo et al., 2017). Although, 1-based methods provided great performance, these methods do not relate to covariance matrix which characterizes the geometric structure of the data, where as F-norm can exploit efficiently the spatial structure that is embedded in the data. Several efforts have been made to utilize F-norm as subspace learning such as 2DPCA (Yang et al., 2004, Yang et al., 2005), 2D-PCA (Tian et al., 2017), F-norm 2DPCA (Li et al., 2017), NM-2DPCA (Chen et al., 2018, Wang et al., 2017), N-2DNPP (Zhang, Li, Zhao, Zhang, & Yan, 2017). However, either these methods still suffer from the effect of outliers or not able to select important features. Furthermore, sensitivity of F-norm is another challenge. Wang et al. presented non-squared F-norm minimization to overcome this challenge (Wang et al., 2017). However, it affects the selection of important features.

To overcome the aforementioned issue of robust feature selection and sensitivity of Frobenius norm, in this paper, we present a novel formulation for PCA that combines the subspace learning and feature selection together in order to exclude the effect of redundant patterns and joint feature selection. We employed Frobenius norm as distance metric learning and seek the projection matrix by joint minimization of regularizer and penalty terms. We relax the orthogonality constraints of transformation matrix and introduce another transformation that helps to jointly select important features and enhances the robustness against outliers. To overcome the sensitivity issue due to squared Frobenius norm, we devised an efficient way to compute F-Norm. As a result, the proposed objective function not only weakens the effect of large distance but also has rotational invariance property. We can describe the theoretical and empirical key contributions of this work as follows:

  • We present outliers robust two-dimensional principal component analysis by efficiently integrating the robustness of traditional 2DPCA and the regularization term QF2 that relaxes the orthogonal constraint.

  • The regularization term QF2 reduces the constraints and enables the objective function to select features jointly. Furthermore, the regularization parameter QF2 is convex and can be easily optimized.

  • To overcome the sensitivity issue of F-Norm against outliers, we efficiently derived the objective function.

  • Penalty term penalizes all regression coefficientscorresponding to single feature as a whole to make PCA possible to select features jointly. Hence, ORPCA approximates high-dimensional representation in flexible manner. As such, ORPCA has more freedom to select low-dimensional features efficiently.

  • The one major drawback of F-norm is its sensitivity against outliers as outlying measurement arbitrarily skew the solution from desired due to squared objective function. As a result, F-norm is not able to utilize the underlying geometric structure in a real sense. To cope the sensitivity due to squared F-norm, recently, non-square F-norm have been used.

  • The latter method is evaluated empirically on four benchmark datasets. Experimental evaluation (discriminant features, computationally and convergence analysis) shows the considerable improvement in most cases, while time complexity remains very attractive.

The rest of the paper is organized as follows. In Section 2, we present basic notations and related work. In Section 3, we present the motivation followed by the proposed objective function and its optimization. In Section 5, we provide detailed experimental evaluations. Finally, conclusion is drawn in Section 6.

Section snippets

Related work

Recently, subspace-learning techniques have shown their great performance and have been widely applied for high-dimensional data representation and classification. In the recent few years, researchers proposed number of methods to reduce the effect of outliers, and several variants have been presented in literature. PCA is one of the most widely used dimension-reduction approach. Unlike traditional PCA, two-dimensional PCA is based on two-dimensional image matrices rather than one dimensional

Motivation

As the aforementioned analysis in Sections 1 Introduction, 2 Related work, for the classification of high-dimensional noisy data, it is always important to find salient features that belong to specific part of image. Since the outlier does not have a precise mathematical meaning, thus the problem of RPCA problem is not well defined yet. Selection of important information by ignoring the redundant could help to improve the feature selection. However, most of the PCA-based methods are sensitive

Outliers robust 2DPCA

In this section, we present outliers robust dimensionality reduction approach (ORPCA) in detail. As described in earlier sections, the projection procedure consists of all the original features, thus, it may also have irrelevant and redundant features which could influence the performance of dimensionality reduction, in result affecting the classification performance. Furthermore, outliers strongly affect the feature selection which depresses the classification performance. In this work, we

Experimental results

In order to evaluate the performance of the proposed ORPCA, in this section, we have discussed and compared the performance of proposed ORPCA on four commonly used image datasets including AR (Martínez & Kak, 2001), Yale B (Sim, Baker, & Bsat, 2002), ORL and CMU PIE. We have used k-nearest neighbor (where k=1) for classification. The main contribution of this work is introducing joint feature selection in order to select useful features by effectively combining the robustness of traditional

Discussion

We notice that methods based on matrix perform better as compared to vector-based methods. Results show that proposed ORPCA finds the representative features from high-dimensional space that are used for classification. Unlike 2DPCA based on 1-norm, ORPCA has rotational invariance property and has the freedom to jointly select the important and contributive features such as nose, eyes, lips in case of face image, while contours of different objects in non-facial datasets. Traditional methods

Conclusion

In this paper, we presented a robust dimensionality reduction method that by relaxing the orthogonal constraints of the transformation matrix and imposing a penalty function on regularization term. In contrast to previous work on robustness in PCA, we jointly select the important features. Introduction of penalty function results in the robustness against outliers by reducing their impact in projection matrix. Compared with state-of-the-art methods, our evaluation results show the improvement

Acknowledgment

This work is partially supported by Australian Research Council Linkage Projects under LP170100891.

References (42)

  • Belhumeur, Peter N., Hespanha, João P., & Kriegman, David J. (1997). Eigenfaces vs. fisherfaces: Recognition using...
  • ChenYudong et al.

    Nuclear norm based two-dimensional sparse principal component analysis

    International Journal of Wavelets, Multiresolution and Information Processing

    (2018)
  • Feng, Jiashi, Xu, Huan, & Yan, Shuicheng (2013). Online robust pca via stochastic optimization. In: Advances in neural...
  • He, Xiaofei, & Niyogi, Partha (2004). Locality preserving projections. In: Advances in neural information processing...
  • KhanZohaib et al.

    Joint group sparse pca for compressed hyperspectral imaging

    IEEE Transactions on Image Processing

    (2015)
  • LiXuelong et al.

    L1-norm-based 2dpca

    IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics)

    (2010)
  • LuHaiping et al.

    Mpca: Multilinear principal component analysis of tensor objects

    IEEE Transactions on Neural Networks

    (2008)
  • LuoMinnan et al.

    Avoiding optimal mean 2, 1-norm maximization-based robust pca for reconstruction

    Neural Computation

    (2017)
  • MartinezAleix M.

    The AR face database

    CVC Technical Report

    (1998)
  • MartínezAleix M. et al.

    Pca versus lda

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • Netrapalli, Praneeth, Niranjan, U.  N., Sanghavi, Sujay, Anandkumar, Animashree, & Jain, Prateek (2014). Non-convex...
  • Cited by (22)

    • Double information preserving canonical correlation analysis

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      These data are called multi-view data (Li et al., 2019b; Salim et al., 2020; Zhao et al., 2017; Sun, 2013; Tang et al., 2018). However, a common situation is that the samples often appear in high dimensional form (Li et al., 2019a; Razzak et al., 2020). In order to avoid “curse of dimensionality” caused by high dimensional samples, researchers have developed a diversity of feature extraction methods to find the low dimensional representation of the original samples without loss of information (Chen et al., 2018; Xia et al., 2010; Zhang et al., 2021a, 2020b, 2017).

    • Discovering dynamic adverse behavior of policyholders in the life insurance industry

      2021, Technological Forecasting and Social Change
      Citation Excerpt :

      As a result, it is not possible to identify real AS users, and many honest policyholders may suffer. It is worth mentioning that the AS detection method developed in this paper is different from the outlier and anomaly detection methods used in other applications (Li et al., 2019; 2012; Li and Wang, 2017; Razzak et al., 2020a; 2020b; Singh and Vardhan, 2019; Tewari and Gupta, 2020; Wang et al., 2019; Yin et al., 2020; 2018) since no explicit labels are provided; instead, we leverage the rule learning technique, which has a long history but still shines in recent works (Zhou et al., 2019). From the above review of the existing studies, it is clear that most of the methods focus on limited aspects and were limited in their performance and capability.

    • Randomized nonlinear one-class support vector machines with bounded loss function to detect of outliers for large scale IoT data

      2020, Future Generation Computer Systems
      Citation Excerpt :

      Unsupervised anomaly detection methods detect the anomalies in an unlabeled data under the assumption that the majority of the instances are normal and small fraction of data show anomalous behavior. One class support vector machines (OCSVM) has proven itself to be one the effective classifier for unsupervised anomaly detection [4,5]. However, it is still sensitive to outliers and computationally complex for large datasets.

    • A unified robust framework for multi-view feature extraction with L2,1-norm constraint

      2020, Neural Networks
      Citation Excerpt :

      MVL has been demonstrated the effectiveness in many applications, such as classification, clustering and feature selection and has attracted more and more attentions (Huang, Chung, & Wang, 2016; Tang et al., 2018; Xie, Gao, Wang, Zhang, & Gao, 2020; Zong, Zhang, & Liu, 2018). With the development of computer vision technique, the samples have often appeared in a high dimensional form (Han et al., 2018; Li, Li, Gao, & Xie, 2017; Razzak, Saris, Blumenstein, & Xu, 2020; Zhu, Xu, Shen, & Zhao, 2017). In order to avoid curse of dimensionality caused by high dimensional samples, researchers have developed a diversity of feature extraction methods.

    View all citing articles on Scopus
    View full text