Computing, Artificial Intelligence and Information Technology
A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems

https://doi.org/10.1016/j.ejor.2004.06.014Get rights and content

Abstract

The minimum sum-of-squares clustering problem is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the former problem based on nonsmooth optimization techniques is developed. The issue of applying this algorithm to large data sets is discussed. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm.

Introduction

Clustering is the unsupervised classification of the patterns. Cluster analysis deals with the problems of organization of a collection of patterns into clusters based on similarity. It has found many applications, including information retrieval, document extraction, image segmentation, etc.

In cluster analysis we assume that we have been given a set X of a finite number of points of d-dimensional space Rd, that isX={x1,,xn},wherexiRd,i=1,,n.The subject of cluster analysis is the partition of the set X into a given number q of overlapping or disjoint subsets Ci, i = 1,…, q, with respect to predefined criteria such thatX=i=1qCi.The sets Ci, i = 1,  , q, are called clusters. The clustering problem is said to be hard clustering if every data point belongs to one and only one cluster. Unlike hard clustering in the fuzzy clustering problem the clusters are allowed to overlap and instances have degrees of appearance in each cluster. In this paper we will exclusively consider the hard unconstrained clustering problem, that is we additionally assume thatCiCk=,i,k=1,,q,ik,and no constraints are imposed on the clusters Ci, i = 1,  , q. Thus every point xX is contained in exactly one and only one set Ci.

Each cluster Ci can be identified by its center (or centroid). Then the clustering problem can be reduced to the following optimization problem (see [7], [8], [38]):minimizeϕ(C,a)=1ni=1qxCiai-x2subjecttoCC¯,a=(a1,,aq)Rd×q,where ∥ · ∥ denotes the Euclidean norm, C = {C1,  , Cq} is a set of clusters, C¯ is a set of all possible q-partitions of the set X, ai is the center of the cluster Ci, i = 1,  , q,ai=1|Ci|xCix,and ∣Ci∣ is a cardinality of the set Ci, i = 1,  , q. The problem (1) is also known as the minimum sum-of-squares clustering. The combinatorial formulation (1) of the minimum sum-of-squares clustering is not suitable for direct application of mathematical programming techniques. The problem (1) can be rewritten as the following mathematical programming problem:minimizeψ(a,w)=1ni=1nj=1qwijaj-xi2subjecttoj=1qwij=1,i=1,,n,andwij{0,1},i=1,,n,j=1,,q.Hereaj=i=1nwijxji=1nwij,j=1,,q,and wij is the association weight of pattern xi with cluster j (to be found), given bywij=1ifpatterniisallocatedtoclusterji=1,,n,j=1,,q,0otherwise,w is an n × q matrix.

There exist different approaches to clustering including agglomerative and divisive hierarchical clustering algorithms as well as algorithms based on mathematical programming techniques. Descriptions of many of these algorithms can be found, for example, in [14], [23], [38]. An excellent up-to-date survey of existing approaches is provided in [24] and a comprehensive list of literature on clustering algorithms is available in this paper.

Problem (2) is a global optimization problem. Therefore different algorithms of mathematical programming can be applied to solve this problem. Some review of these algorithms can be found in [18] with dynamic programming, branch and bound, cutting planes, k-means algorithms being among them. Dynamic programming approach can be effectively applied to the clustering problem when the number of instances n⩽20, which means that this method is not effective to solve real-world problems (see [25]). However, when q = 1 the clustering problem can be solved exactly by dynamic programming, in polynomial time [38].

Branch and bound algorithms are effective when the database contain only hundreds of records and the number of clusters is not large (less than 5) (see [13], [17], [18], [27]). For these methods the solution of clustering problems with n  1000 and q  10 is out of reach. Different heuristics can be used for solving large clustering problems and k-means is one such algorithm. Different versions of this algorithm have been studied by many authors (see [38]). This is a very fast algorithm and it is suitable for solving clustering problems in large data sets. k-means gives good results when there are few clusters but deteriorates when there are many [18]. This algorithm achieves a local minimum of problem (1) (see [36]), however, results of numerical experiments presented, for example, in [21] show that the best clustering found with k-means may be more than 50% worse than the best known one.

Much better results have been obtained with metaheuristics, such as simulated annealing, tabu search and genetic algorithms [34]. The simulated annealing approaches to clustering have been studied, for example, in [9], [37], [39]. Application of tabu search methods for solving clustering problem is studied in [1]. Genetic algorithms for clustering have been described in [34]. The results of numerical experiments, presented in paper [2] show that even for small problems of cluster analysis when the number of entities n⩽100 and the number of clusters q⩽5 these algorithms take 500–700 (sometimes several thousands) times more CPU time than the k-means algorithms. For relatively large databases one can expect that this difference will increase. This makes metaheuristic algorithms of global optimization ineffective for solving many clustering problems. However, these algorithms can be applied to large clustering problems if combined with decomposition (see [20]).

An approach to cluster analysis problems based on bilinear programming techniques has been described in [29]. The paper [5] describes the global optimization approach to clustering and demonstrates how the supervised data classification problem can be solved via clustering. The objective function in this problem is both nonsmooth and nonconvex and this function has a large number of local minimizers. Problems of this type are quite challenging for general-purpose global optimization techniques. Due to the large number of variables and the complexity of the objective function these techniques, as a rule, fail to solve such problems.

In [15] an interior point method for minimum sum-of-squares clustering problem is developed. The papers [20], [30] develops variable neighborhood search algorithm and the paper [19] presents j-means algorithm which extends k-means by adding a jump move. The global k-means heuristic, which is an incremental approach to minimum sum-of-squares clustering problem, is developed in [28]. The incremental approach is also studied in the paper [21]. Results of numerical experiments presented show the high effectiveness of these algorithms for many clustering problems.

As mentioned above, the problem (2) is the global optimization problem and the objective function in this problem has many local minima. However, global optimization techniques are highly time-consuming for solving many clustering problems. It is very important, therefore, to develop clustering algorithms based on optimization techniques that compute “deep” local minimizers of the objective function. The clustering algorithm proposed and studied in this paper is of this type and is based on nonsmooth optimization techniques. The algorithm provides the capability of calculating clusters step-by-step, gradually increasing the number of data clusters until termination conditions are met, that is it allows one to calculate as many cluster as a data set contains with respect to some tolerance.

The paper is organized as follows: the nonsmooth optimization approach to clustering is presented in Section 2. Section 3 describes an algorithm for solving clustering problems. An algorithm for solving optimization problems is discussed in Section 4. The issues of the complexity reduction for clustering in large data set are discussed in Section 5; while Section 6 presents discussion of the results of the numerical experiments. Section 7 concludes the paper.

Section snippets

The nonsmooth optimization approach to minimum sum-of-squares clustering

In this section we present a formulation of the clustering problem in terms of nonsmooth, nonconvex optimization.

The problems (1), (2) can be reformulated as the following mathematical programming problem (see [5], [6], [7], [8]):minimizef(a1,,aq)subjecttoa=(a1,,aq)Rd×q,wheref(a1,,aq)=1ni=1nminj=1,,qaj-xi2.It is shown in [7] that problems (1), (2), (3) are equivalent. The number of variables in problem (2) is (n + d) × q whereas in problem (3) this number is only d × q and the number of

An optimization clustering algorithm

In this section we will describe a clustering algorithm.

Algorithm 1

An algorithm for solving a cluster analysis problem.

  • Step 1.

    (Initialization). Select a tolerance ϵ > 0. Select a starting point a0=(a10,,an0)Rd and solve the minimization problem (3) with q0 = 1. Let a1Rd be a solution to this problem and f1∗ be the corresponding objective function value. Set k = 1.

  • Step 2.

    (Computation of the next cluster center). Select a point y0Rd and solve the following minimization problem:minimizef¯k(y)subjecttoyRd,wheref¯k(y)=i=1n

Solving optimization problems

In this section we will discuss an algorithm for solving problems (5), (6) in the clustering algorithm. Both these problems are nonsmooth optimization problems. Therefore, first we recall some definitions from nonsmooth analysis.

Let Φ be a function defined on Rp. This function is said to be a locally Lipschitz continuous on Rp if for any bounded subset SRp there exists a constant L > 0 such that|Φ(y)-Φ(u)|Ly-uy,uS.The function Φ is differentiable almost everywhere and one can define for it

Complexity reduction for large-scale data sets

Due to the highly combinatorial nature of clustering problems, two characteristics of a given data set can severely affect the performance of a clustering tool: the number of data records (instances) and the number of data attributes (features). In many cases the development of effective tools requires the reduction of both the number of features and the number of instances without loss of knowledge generating ability. In this section we will consider one scheme for reducing the number of

Results and discussion

To verify the effectiveness of the clustering algorithm a number of numerical experiments with small and middle-sized data sets have been carried out on a Pentium-4, 1.7 GHz, PC.

First we consider three standard test problems to compare our algorithm with the following heuristics and metaheuristics: the k-means algorithm (K-M), the tabu search (TS) method, a genetic algorithm (GA) and the simulated annealing (SA) method. Then we use four other test data sets to compare Algorithm 1 with the

Conclusions

In this paper a nonsmooth nonconvex optimization-based algorithm for solving cluster analysis problems has been proposed. As this algorithm calculates clusters step by step, it allows the decision maker to easily vary the number of clusters according to the criteria suggested by the nature of the decision making situation not incurring the obvious costs of the increased complexity of the solution procedure. The suggested approach utilizes a stopping criterion that prevents the appearance of

Acknowledgements

The authors would like to thank the three anonymous referees whose very detailed comments have considerably improved this paper.

This research was supported by the Australian Research Council.

References (39)

  • A.M. Bagirov

    Minimization methods for one class of nonsmooth functions and calculation of semi-equilibrium prices

  • A.M. Bagirov

    A method for minimization of quasidifferentiable functions

    Optimization Methods and Software

    (2002)
  • A.M. Bagirov et al.

    Using global optimization to improve classification for medical diagnosis and prognosis

    Topics in Health Information Management

    (2001)
  • A.M. Bagirov et al.

    Global optimization approach to classification

    Optimization and Engineering

    (2001)
  • H.H. Bock

    Automatische Klassifikation

    (1974)
  • H.H. Bock

    Clustering and neural networks

  • F.H. Clarke

    Optimization and Nonsmooth Analysis

    (1983)
  • V.F. Demyanov et al.

    Constructive Nonsmooth Analysis

    (1995)
  • I.S. Dhillon et al.

    Efficient clustering of very large document collections

  • Cited by (85)

    • Optimization problems for machine learning: A survey

      2021, European Journal of Operational Research
    • Merging anomalous data usage in wireless mobile telecommunications: Business analytics with a strategy-focused data-driven approach for sustainability

      2020, European Journal of Operational Research
      Citation Excerpt :

      We face the same clustering problem as Santi et al. (2016) such that all available dissimilarity matrices are used to deal with heterogeneity. Numerous studies use the axiomatic fuzzy set (AFS) clustering methodology (see Bagirov & Yearwood, 2006; Xie et al., 2016; Xu, Liu, & Chen, 2009). Kim, Lee, Lee, and Lee (2005) conduct a kernel-based classification with four clustering algorithms (i.e., K-means, Fuzzy C-means, average linkage, and mountain algorithm) and evaluate their performances for various datasets.

    View all citing articles on Scopus
    View full text