An efficient line symmetry-based K-means algorithm

https://doi.org/10.1016/j.patrec.2005.11.006Get rights and content

Abstract

Recently, Su and Chou presented an efficient point symmetry-based K-means algorithm. Extending their point symmetry-based K-means algorithm, this paper presents a novel line symmetry-based K-means algorithm for clustering the data set with line symmetry property. Based on some real data sets, experimental results demonstrate that our proposed line symmetry-based K-means algorithm is rather encouraging.

Introduction

Partitioning a set of data points into some nonoverlapping clusters is an important topic in data analysis and pattern classification. It has many applications, such as codebook design (Gersho and Gray, 1992), data mining (Ng and Han, 2002), image segmentation (Jain and Dubes, 1988), data compression (Sayood, 1996), etc. Many efficient clustering algorithms (Fischer and Buhmann, 2003, Bajcsy and Ahuja, 1998, Hartigan, 1975, Zhu and Po, 1998, Fred and Leitao, 2003, Su and Chou, 2001) have been developed for data sets of different distributions in the past several decades. Most of existing clustering algorithms adopt the 2-norm distance measure in clustering process.

Among these developed clustering algorithms, Su and Chou (2001) first took the point symmetry issue (Zabrodsky et al., 1995, Kanatani, 1997) into account. Based on their proposed point symmetry distance (PSD) measure, they presented a novel and efficient clustering algorithm, which is very suitable for symmetrical intra-clusters; for convenience, their proposed clustering algorithm is named the PSK algorithm. Experimental results demonstrate that the previous PSK clustering algorithm outperforms the traditional K-means algorithm. In essence, the PSK algorithm not only inherits the simplicity advantage of the K-means algorithm, but it also can handle the symmetrical intra-clusters quite well. Recently, their proposed PSK algorithm was improved by Chung and Lin (in press) and extended to be able to handle both the symmetrical intra-clusters and the symmetrical inter-clusters; for convenience, their proposed clustering algorithm is called the IPSK algorithm. From the geometrical symmetry viewpoint, point symmetry and line symmetry are two widely discussed issues. The motivation of our research is to develop a new clustering algorithm for handling the data set with line symmetry property while preserving the advantages in the previous PSK algorithm and the previous IPSK algorithm.

In this paper, we propose a line symmetry-based K-means (LSK) algorithm for clustering the data set with line symmetry property while preserving the advantages in the previous PSK algorithm and the previous IPSK algorithm. Consequently, the proposed clustering algorithm can handle the data set with point symmetry property, line symmetry property, or both properties. Given a data set, the K-means algorithm is first used to obtain k temporary clusters. Second, the concept of centroid moment (Hu, 1962) is applied to determine the symmetrical line of each cluster which has been obtained by the K-means algorithm. Finally, the symmetry similarity level (SSL) operator is modified and extended to measure the line symmetry level between two data points. The modified SSL operator is called the MSSL operator for convenience. Utilizing the obtained symmetrical line of each cluster and the proposed MSSL operator, our proposed LSK algorithm can determine the most line-symmetrical data points when we are given a set of data points. Under some real data sets, experimental results demonstrate the feasibility of our proposed line-symmetry based K-means algorithm and the experimental results are rather encouraging.

The remainder of this paper is organized as follows. In Section 2, the previous PSK algorithm by Su and Chou is surveyed. In Section 3, the proposed MSSL operator is presented to measure the level of symmetry and it will be used in our propose LSK algorithm. In Section 4, our proposed LSK algorithm is described. In Section 5, some experimental results are demonstrated to show the effectiveness of the proposed LSK algorithm. In Section 6, some conclusion remarks are addressed.

Section snippets

The past work by Su and Chou

Different natural scenes usually have different features. Among these features, symmetry property is one of the most popular ones. Based on K-means algorithm, recently Su and Chou (2001) presented an efficient PSD measure to help partitioning the data set into the clusters where each cluster has the point symmetry property. In this section, the previous PSK algorithm by Su and Chou is surveyed.

Given N data points, {pi  for 1  i  N}, after running the K-means algorithm, let the obtained k temporary

The proposed modified symmetry similarity level operator

Given a set of data points, first the traditional K-means algorithm is used to obtain k temporary clusters. Next, we want to find the symmetrical line of each cluster by using the central moment technique (Gonzalez and Wood, 2002). The found symmetrical line will be used to measure the symmetry similarity level between two data points relative to that symmetrical line.

Suppose the given data set is covered by an h × w integer domain, the (p, q)th order moment is defined asmpq=1xh1ywxpyqf(x,y),

The proposed line symmetry-based K-means algorithm

In this section, we present the proposed line symmetry-based K-means (LSK) algorithm which extends the previous PSK algorithm by Su and Chou from handling the point symmetrical data set to handling the point symmetrical data set, the line symmetrical data set, or both of them.

The proposed LSK algorithm adopts the conventional K-means algorithm as a preprocessing step, then utilizes the concept of a major axis and the proposed MSSL operator to measure the symmetry level of the concerning two

Experimental results

In this section, several artificial and real data sets are used to demonstrate the feasibility and the extension capability of our proposed LSK algorithm. Experimental results reveal that our proposed LSK algorithm has encouraging results. Throughout the following experiments, the parameter ρ is selected to be five. In addition, the thresholds for MDSL and MOSL are selected to be 0.60 and 0.97, respectively.

Using the same experimental data set as in the PSK algorithm (Su and Chou, 2001), the

Conclusions

In this paper, we have presented the line symmetry-based K-means algorithm. The proposed new clustering algorithm not only can cluster data sets with the property of line symmetry successfully, but also preserves the clustering advantages in the previous PSK algorithm and the previous IPSK algorithm. Under some real data sets, experimental results demonstrate that the feasibility of our proposed line-symmetry based K-means algorithm and the relevant experimental results are rather encouraging.

References (16)

  • P. Bajcsy et al.

    Location and density based hierarchical clustering using similarity analysis

    IEEE Trans. Pattern Anal. Machine Intel.

    (1998)
  • Chung, K.L., Lin, J.S., in press. Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit.,...
  • B. Fischer et al.

    Bagging for path based clustering

    IEEE Trans. Pattern Anal. Machine Intel.

    (2003)
  • L.N. Fred et al.

    A new cluster isolation criterion based on dissimilarity increments

    IEEE Trans. Pattern Anal. Machine Intel.

    (2003)
  • A. Gersho et al.

    Vector Quantization and Signal Compression

    (1992)
  • R.C. Gonzalez et al.

    Digital Image Processing

    (2002)
  • J. Hartigan

    Clustering Algorithms

    (1975)
  • K. Hoffman et al.

    Linear Algebra

    (1961)
There are more references available in the full text version of this article.

Cited by (0)

1

Supported in part by the National Science Council of ROC under contract NSC92-2213-E-011-079.

View full text