Robust plug-in estimators in proportional scatter models

https://doi.org/10.1016/j.jspi.2003.06.006Get rights and content

Abstract

We address the problem of estimating the principal axes and their size in the case of several populations under the assumption of a proportional model. We propose robust estimators for the common principal axes and their size. The robust estimators are based on asymptotically normal and equivariant robust scatter estimators. The asymptotic distribution of the robust estimators including the proportionality constants are derived.

Introduction

Many methods involving several populations in multivariate analysis assume equality of covariance matrices. Scatter matrices can share more complex relationships among them than just being equal or not. For example, one matrix might be identical to another except that each element of the first matrix is multiplied by a single constant. We would then say that the matrices are proportional to one another. A more precise definition of proportionality is that the matrices share identical eigenvectors (or principal components), but their eigenvalues differ in a proportionality constant. A weaker relationship between matrices could be that they are commutable and so, they have the same principal components, but different eigenvalues, as is the case in the common principal component model (CPC model) proposed by Flury (1984).

Assume that we have k populations in Rp, with covariance matrices Σ1,…,Σk. The common principal components model states thatΣi=βΛiβ′,1⩽i⩽k,where Λi is a diagonal matrix and β is an orthogonal matrix. The proportional model could be seen as special case of the CPC model, obtained by imposing further constraints on the parameter spaceΣiiΣ1for2⩽i⩽k.In the one-group principal component analysis the eigenvectors βi forming the orthogonal matrix β=(β1,…,βp) are usually ordered according to an increasing order of the associated eigenvalues. In the proportional model, in order to identify uniquely the axis, it is usually assumed that the eigenvalues of Σ1 are distinct and that the columns of β are arranged according to increasing values of the eigenvalues of Σ1.

Proportional covariance matrix estimation, in the two-sample case, has been considered by Khatri (1967) and Pillai et al. (1969). These authors studied the distribution of the ratios of the characteristic roots of S1S2−1, where Si are the sample covariance matrices. However, these authors neither explicitly construct tests for proportionality, nor indicate how to estimate proportional covariance matrices. Guttman et al. (1985) and Rao (1983) derived the asymptotic distribution of the proportionality constant estimators.

For the case of k>2, Kim (1971) showed that, under normal sampling with proportional covariance matrices, there exists at least one solution of the likelihood equations, and derived the joint asymptotic distribution of the k−1 estimated constants of proportionality. Owen (1984) considered the case of several groups in the context of a classification problem and gave an algorithm to find the maximum likelihood estimators. Essentially the same algorithm was considered by Manly and Rayner (1987) and Eriksen (1987) who, in addition, proved the convergence of the algorithm and the uniqueness of maximum likelihood estimators. Flury (1986), using a different parametrization based on the CPC model, obtained a system of equations, which defines the maximum likelihood estimators and gave an algorithm to solve it.

The asymptotic distribution of the maximum likelihood estimators for the CPC model and the proportionality model are given in Flury (1988) under normal sampling. A robustified version of these estimators, under a CPC model, was given by Novi Inverardi and Flury (1992). These authors considered robust and independent estimators of the scatter matrices of the k populations using the affine-equivariant M-estimators studied by Maronna (1976) and plugged them into the equations defining the maximum likelihood estimators of the parameters. Boente and Orellana (2001) established some asymptotic properties of these plug-in estimators for the CPC model, when using any robust, asymptotically normally distributed scatter matrix and also considered an approach based on projection pursuit principles.

This paper focuses on robust estimation using a plug-in approach, under proportionality of the scatter matrices. In Section 2, we describe the proposal to be considered for the proportionality model. In Section 3, some asymptotic results are established, while an application to test the hypothesis of equality against proportionality is given in Section 4. All proofs are given in the appendix.

Section snippets

Proportionality model

Let xi1,…,xini, 1⩽ik, be independent observations from k independent samples in Rp with location parameter μi and scatter matrix Σi. We are interested in robustly estimating the common eigenvectors β=(β1,…,βp) of Σi, the eigenvalues of Σ1 and the proportionality constants ρi>0 under the proportionality modelΣiiΣ1iβΛ1β,1⩽i⩽k,ρ1=1,with Λ1=diag1,…,λp) and λ1<⋯<λp the eigenvalues of Σ1.

Let N=∑i=1kni. Flury (1986) obtains the maximum likelihood estimators, for normally distributed

Asymptotic distribution

A standard framework to derive the asymptotic behavior in robust principal component analysis is to assume that the estimators of the scatter matrix are asymptotically normally distributed and spherically invariant. For that reason, and since the samples of the k populations are independent, we will assume, throughout this section, that for 1⩽ik, the estimators, Vi, of the scatter matrix Σi are independent and satisfy the following assumptions

  • A1.

    ni(ViΣi)DZi where Zi has a multivariate normal

A test of equality against proportionality

The results and proposals of the previous sections can be used to test the hypothesis of equality of several scatter matrices against proportionality. This corresponds to the two first levels of similarity among the covariance matrices of k populations considered in Flury (1988). Effectively, assume that we want to testH0:Σ1=Σ2=⋯=Σk,H1:ΣiiΣ1,2⩽i⩽kand∃i:ρi≠1.The robust estimators defined in Section 2.2 provide statistics more resistant to outlying observations than the classical ones and thus,

References (34)

  • G. Boente

    Asymptotic theory for robust principal components

    J. Multivariate Anal.

    (1987)
  • B. Flury

    Proportionality of k covariance matrices

    Statist. Probab. Lett.

    (1986)
  • C.R. Rao

    Likelihood tests for relationships between covariance matrices

  • T.W. Anderson

    An Introduction to Multivariate Statistical Analysis

    (1984)
  • Berrendero, J.R., 1996. Contribuciones a la teorı́a de la robustez respecto al sesgo. Tesis de Doctorado, Universidad...
  • G. Boente et al.

    A robust approach to common principal components

  • G. Boente et al.

    Influence functions and outlier detection under the common principal components model: a robust approach

    Biometrika

    (2002)
  • C. Croux et al.

    Principal component analysis based on robust estimators of the covariance or correlation matrixinfluence functions and efficiencies

    Biometrika

    (2000)
  • C. Croux et al.

    A fast algorithm for robust principal components based on projection pursuit

  • Croux, C., Ruiz-Gazen, A., 2000. High breakdown estimators for principal components: The projection-pursuit approach...
  • S.J. Devlin et al.

    Robust estimation of dispersion matrices and principal components

    J. Amer. Statist. Assoc.

    (1981)
  • Donoho, D.L., 1982. Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Harvard...
  • P.S. Eriksen

    Proportionality of covariance matrices

    Ann. Statist.

    (1987)
  • B. Flury

    Common principal components in groups

    J. Amer. Statist. Assoc.

    (1984)
  • B. Flury

    Common Principal Components and Related Multivariate Models

    (1988)
  • Guttman, I., Kim, D.Y., Olkin, I., 1985. Statistical inference for constants of proportionality. In: Krishnaiah, P.R....
  • P. Huber

    Robust Statistics

    (1981)
  • Cited by (9)

    • High-dimensional testing for proportional covariance matrices

      2019, Journal of Multivariate Analysis
      Citation Excerpt :

      We will provide a brief review of this test problem in the next paragraph, and in Remark 6 after presenting our proposed approach. For other hypothesis testing regarding covariance matrices, readers are referred to [2,16]. In Section 2, several assumptions on the covariance matrices are introduced.

    • Robust nonparametric kernel regression estimator

      2016, Statistics and Probability Letters
      Citation Excerpt :

      The most familiar practice in bandwidth selection in both non-robust and robust methods is to minimize the asymptotic mean squared error (MSE), see for example, Cleveland (1979), Härdle and Marron (1985), Cheng and Cheng (1987), Hall and Jones (1990) and Cantoni and Ronchetti (2001). Plug-in method is proposed as an important tool of bandwidth selection in robust nonparametric regression (Boente et al., 1997; Boente and Orellana, 2004; Bianco and Boente, 2007; Boente and Rodriguez, 2008). Cross-validation, as another popular bandwidth selection tool, is studied in Rice (1984), Leungi et al. (1993) and Leung (2005).

    • Robust tests for the common principal components model

      2009, Journal of Statistical Planning and Inference
    View all citing articles on Scopus

    This research was partially supported by Grant PICT # 03-00000-006277 from ANPCYT at Buenos Aires, Argentina. The research of Graciela Boente was also partially supported by a Guggenheim fellowship.

    View full text