Stochastics and Statistics
Nonsmooth nonconvex optimization approach to clusterwise linear regression problems

https://doi.org/10.1016/j.ejor.2013.02.059Get rights and content

Highlights

  • We develop an incremental algorithm to solve clusterwise linear regression problems.

  • The algorithm gradually computes clusters and linear regression functions.

  • Two special procedures to construct initial solutions are proposed.

  • The algorithm finds global or near global minimizers of the overall fit function.

Abstract

Clusterwise regression consists of finding a number of regression functions each approximating a subset of the data. In this paper, a new approach for solving the clusterwise linear regression problems is proposed based on a nonsmooth nonconvex formulation. We present an algorithm for minimizing this nonsmooth nonconvex function. This algorithm incrementally divides the whole data set into groups which can be easily approximated by one linear regression function. A special procedure is introduced to generate a good starting point for solving global optimization problems at each iteration of the incremental algorithm. Such an approach allows one to find global or near global solution to the problem when the data sets are sufficiently dense. The algorithm is compared with the multistart Späth algorithm on several publicly available data sets for regression analysis.

Introduction

Unsupervised classification, or clustering, is an important task in data mining, which consists in finding subsets of similar points in a data set, in order to find patterns in the data. Regression analysis consists in fitting a function (often linear) to the data to discover how one or more variables vary as a function of another.

The aim of clusterwise regression is to combine both of these techniques, to discover trends within data, when more than one trend is likely to exist. Clusterwise regression has applications for instance in market segmentation, where it allows one to gather information on customer behaviors for several unknown groups of customers [1], [7]. It is also applied to investigate the stock-exchange data [20] and the so-called benefit segmentation [28]. The presence of nonlinear relationships, heterogeneous subjects, or time series in these applications necessitate the use of two or more regression functions to best summarize the underlying structure of the data.

The simplest case in the clusterwise regression is the use of two or more linear regression functions to investigate the structure of the data. Such an approach is called clusterwise linear regression and it is widely used and studied better than other approaches. This problem can be formulated as an optimization problem. Mixed integer nonlinear programming formulations can be found in [9], [12]. Such problems may have a very large number of variables even for moderately large data sets. Therefore exact global optimization of such problems is very challenging and out of reach of existing algorithms [9]. The most popular approaches to clusterwise linear regression are generalizations of classical clustering algorithms such as k-means [23], [24] or EM [14]. In [8] the authors base their approach on the variable neighborhood search.

In the paper [20] the clusterwise linear regression is studied when the set of predictor variables forms an L2-continuous stochastic process. For each cluster the estimators of the regression coefficients are given by partial least square regression. The number of clusters is treated as unknown. The paper [15] extends the so-called TCLUST methodology to perform robust clusterwise linear regression. In this paper, a feasible algorithm for the practical implementation is also proposed.

The paper [12] presents a conditional mixture, maximum likelihood methodology for performing clusterwise linear regression. This methodology simultaneously estimates separate regression functions and membership in K > 0 clusters or groups. The conditional mixture, maximum likelihood methodology is introduced together with the EM algorithm utilized for parameter estimation.

Existing clusterwise linear regression algorithms suffer from the same drawbacks as their clustering counterparts: they are very sensitive to the choice of an initial solution and they may lead to sub-optimal solutions [31]. Furthermore, most of these algorithms assume the number of clusters to be known a priori. Most of algorithms try to separate data into subsets of observations and use one regression function for each subset.

There have been several attempts to simultaneously find all regression functions to approximate a data set and to estimate the number of subsets. The paper [13] presents a methodology which simultaneously clusters observations into a preset number of groups and estimates the corresponding regression functions’ coefficients, all to optimize a common objective function. Then a simulated annealing-based methodology is described to accommodate overlapping or nonoverlapping clustering. In the paper [16], the authors show that the estimation of the clusterwise regression model is equivalent to solving a nonlinear mixed integer programming model.

An information-based criterion for determining the number of clusters in the clusterwise regression problem is proposed in [22]. It is shown that, under a probabilistically structured population, the proposed criterion selects the true number of regression hyperplanes with probability one among all class-growing sequences of classifications, when the number of observations from the population increases to infinity. The paper [21] studies the problem of estimating the number of clusters in the context of logistic regression clustering. The classification likelihood approach is employed to tackle this problem. A model-selection based criterion for selecting the number of logistic curves is proposed and its asymptotic property is also considered.

In this paper, a new approach for solving the clusterwise linear regression problems is proposed based on a nonsmooth nonconvex formulation. This approach starts with one regression function and summarizes the underlying structure of the data by dynamically adding one hyperplane at each iteration. A special procedure is introduced to generate good starting points for solving global optimization problems at each iteration of the incremental algorithm. Such an approach allows one to find global or near global solution to the problem when a data set is sufficiently dense.

Several incremental algorithms have been proposed to solve the sum of squares clustering problems. The global k-means algorithm and its variations [2], [17] are based on constructing the clusters incrementally, starting from finding the center for the whole data set and then adding a cluster at a time and refining the new set of clusters by applying k-means. In this paper we propose to apply a similar scheme in order to solve the clusterwise linear regression problem. Instead of classical centers, we propose to use affine functions as representatives of clusters.

The proposed algorithm is numerically tested on twenty small and seven medium size and large publicly available data sets for regression analysis. We also compare it with the multi-start Späth algorithm for the clusterwise linear regression. Additionally, we study the efficiency of the proposed algorithm depending on the number of points, features and clusters using randomly generated data sets.

The structure of the paper is as follows. In Section 2 the clusterwise linear regression problem is introduced. We briefly explain the Späth algorithm for solving the clusterwise linear regression problem in Section 3 and describe the incremental algorithm in Section 4. Computation of initial solutions is discussed in Section 5. Section 6 contains computational results and Section 7 concludes the paper.

Section snippets

Clusterwise linear regression problem

In this section we will present the nonsmooth nonconvex optimization formulation of the clusterwise linear regression problem. Given a data set A={(ai,bi)Rn×R:i=1,,m}, the aim of the clusterwise linear regression is to find simultaneously an optimal partition of data in k clusters and regression coefficients {xj, yj}, j = 1,  , k within clusters in order to minimize the overall fit. Let Aj, j = 1,  , k be clusters such thatAj,AjAt=,j,t=1,,k,tjandA=j=1kAj.Let {xj, yj} be linear regression

An algorithm for clusterwise linear regression

In this section we recall the algorithm from [24] for solving Problem (2) which is based on the well known k-means algorithm. In this paper the algorithm was described for p = 2, however, in our description below we will present it for p  1 in general.

Algorithm 1

Späth algorithm

Step 1: (Initialization) Select mutually disjoint clusters A1,  , Ak such that j=1kAj=A.
Step 2: For j = 1,  , k, solve the following linear regression problem:
minimizeφ(xj,yj)=(a,b)Ajh(xj,yj,a,b)subjecttoxjRn,yjR.
 and obtain regression

An incremental algorithm

The global optimization Problem (2) may have a large number of solutions among which only global or near global ones are of interest. However, conventional global optimization techniques cannot be directly applied to solve this problem due to its size, while efficient local methods can only reach local solutions whose quality depends on starting points. Therefore it is crucial to develop a procedure for finding those good starting points. In this section we propose to incorporate local methods

Computation of initial solutions

In this section we design an algorithm for solving Problem (6).

We denote a hyperplane by a pair (u,v) where uRn and vR. Consider the following set of hyperplanes:Ck=(u,v)Rn+1:h(u,v,a,b)>rk-1ab(a,b)A.The set Ck contains all hyperplanes which do not attract any point from the set A. It is clear that over this set the function f¯k is constant and reaches its global maximum value (5). Therefore any hyperplane from the set Ck is a stationary point for the function f¯k. This means that if we

Numerical results and discussions

In this section we present results of numerical experiments by applying the proposed algorithm to some real and random regression data sets. First we present some illustrative examples using three small data sets. Then results of numerical experiments on data sets with known solutions will be demonstrated and finally, we present results on large data sets. We also compare the proposed algorithm with the multi-start Späth algorithm using numerical results.

Conclusions

In this paper we developed an incremental algorithm for solving the clusterwise linear regression problem. This algorithm gradually finds clusters and linear regression functions within these clusters and minimizes the overall fit function. We proposed the algorithm to construct initial solutions at each iteration of the incremental algorithm using results obtained at the previous iteration. This algorithm allows one to find significantly more accurate solutions considerably faster than the

Acknowledgement

The authors are grateful to two anonymous referees for their valuable comments which significantly improved the presentation of the paper.

References (31)

  • C. Preda et al.

    Clusterwise PLS regression on a stochastic process

    Computational Statistics & Data Analysis

    (2005)
  • Q. Shao et al.

    A consistent procedure for determining the number of clusters in regression clustering

    Journal of Statistical Planning and Inference

    (2005)
  • M. Wedel et al.

    Consumer benefit segmentation using clusterwise linear regression

    International Journal of Research in Marketing

    (1989)
  • I-Cheng Yeh

    Modeling of strength of high performance concrete using artificial neural networks

    Cement and Concrete Research

    (1998)
  • I-Cheng Yeh

    Modeling slump flow of concrete using second-order regressions and artificial neural networks

    Cement and Concrete Composites

    (2007)
  • Cited by (31)

    • Kernel-based online regression with canal loss

      2022, European Journal of Operational Research
    • Incremental method for multiple line detection problem — iterative reweighted approach

      2020, Mathematics and Computers in Simulation
      Citation Excerpt :

      Algorithm 3 will be investigated together with checking the necessary conditions for the MAPart defined in Section 4.3.1 and performing the necessary corrections described in Section 4.3.2. This also means that the method proposed in this paper is more efficient than the standard incremental algorithm [2,43]. If data are not homogeneously distributed around lines from which they were obtained, which is a common case in real-world images, then the necessary conditions for the MAPart should be expanded with the following condition:

    • Clusterwise support vector linear regression

      2020, European Journal of Operational Research
      Citation Excerpt :

      Then using the penalty function this problem is replaced with an unconstrained nonsmooth optimization problem, where the regression errors are defined using the L1-risk, and small perturbations from hyperplanes are tolerated without penalty. This model differs from the typical nonsmooth nonconvex formulation of CLR (Bagirov & Ugon, 2018; Bagirov et al., 2013), where regression errors are defined using the L2-risk and all deviations are penalized. Similar to Bagirov and Ugon (2018), the objective function in the new formulation is represented as a DC function.

    • Clusterwise linear regression modeling with soft scale constraints

      2017, International Journal of Approximate Reasoning
      Citation Excerpt :

      It contains 59 records for some U.S. small-companies' CEO salaries (dependent variable), and CEO ages (independent variable). Bagirov et al. [2] fitted a 2-component and a 4-component clusterwise linear regression, whereas Carbonneau et al. [3] focused on the perhaps most intuitive 2-component setup. We fitted respectively 2-component (Fig. 1), 3-component (Fig. 2), and 4-component (Fig. 3) clusterwise linear regressions, and graphically compared the regression lines and crisp classifications obtained.

    • Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

      2017, Atmospheric Research
      Citation Excerpt :

      However, these algorithms use the same procedure to generate initial solutions. The description of this procedure can be found in Bagirov et al. (2013, 2015a,b). It involves the so-called auxiliary clusterwise linear regression problem.

    View all citing articles on Scopus
    View full text