An efficient line symmetry-based K-means algorithm
Introduction
Partitioning a set of data points into some nonoverlapping clusters is an important topic in data analysis and pattern classification. It has many applications, such as codebook design (Gersho and Gray, 1992), data mining (Ng and Han, 2002), image segmentation (Jain and Dubes, 1988), data compression (Sayood, 1996), etc. Many efficient clustering algorithms (Fischer and Buhmann, 2003, Bajcsy and Ahuja, 1998, Hartigan, 1975, Zhu and Po, 1998, Fred and Leitao, 2003, Su and Chou, 2001) have been developed for data sets of different distributions in the past several decades. Most of existing clustering algorithms adopt the 2-norm distance measure in clustering process.
Among these developed clustering algorithms, Su and Chou (2001) first took the point symmetry issue (Zabrodsky et al., 1995, Kanatani, 1997) into account. Based on their proposed point symmetry distance (PSD) measure, they presented a novel and efficient clustering algorithm, which is very suitable for symmetrical intra-clusters; for convenience, their proposed clustering algorithm is named the PSK algorithm. Experimental results demonstrate that the previous PSK clustering algorithm outperforms the traditional K-means algorithm. In essence, the PSK algorithm not only inherits the simplicity advantage of the K-means algorithm, but it also can handle the symmetrical intra-clusters quite well. Recently, their proposed PSK algorithm was improved by Chung and Lin (in press) and extended to be able to handle both the symmetrical intra-clusters and the symmetrical inter-clusters; for convenience, their proposed clustering algorithm is called the IPSK algorithm. From the geometrical symmetry viewpoint, point symmetry and line symmetry are two widely discussed issues. The motivation of our research is to develop a new clustering algorithm for handling the data set with line symmetry property while preserving the advantages in the previous PSK algorithm and the previous IPSK algorithm.
In this paper, we propose a line symmetry-based K-means (LSK) algorithm for clustering the data set with line symmetry property while preserving the advantages in the previous PSK algorithm and the previous IPSK algorithm. Consequently, the proposed clustering algorithm can handle the data set with point symmetry property, line symmetry property, or both properties. Given a data set, the K-means algorithm is first used to obtain k temporary clusters. Second, the concept of centroid moment (Hu, 1962) is applied to determine the symmetrical line of each cluster which has been obtained by the K-means algorithm. Finally, the symmetry similarity level (SSL) operator is modified and extended to measure the line symmetry level between two data points. The modified SSL operator is called the MSSL operator for convenience. Utilizing the obtained symmetrical line of each cluster and the proposed MSSL operator, our proposed LSK algorithm can determine the most line-symmetrical data points when we are given a set of data points. Under some real data sets, experimental results demonstrate the feasibility of our proposed line-symmetry based K-means algorithm and the experimental results are rather encouraging.
The remainder of this paper is organized as follows. In Section 2, the previous PSK algorithm by Su and Chou is surveyed. In Section 3, the proposed MSSL operator is presented to measure the level of symmetry and it will be used in our propose LSK algorithm. In Section 4, our proposed LSK algorithm is described. In Section 5, some experimental results are demonstrated to show the effectiveness of the proposed LSK algorithm. In Section 6, some conclusion remarks are addressed.
Section snippets
The past work by Su and Chou
Different natural scenes usually have different features. Among these features, symmetry property is one of the most popular ones. Based on K-means algorithm, recently Su and Chou (2001) presented an efficient PSD measure to help partitioning the data set into the clusters where each cluster has the point symmetry property. In this section, the previous PSK algorithm by Su and Chou is surveyed.
Given N data points, {pi ∣ for 1 ⩽ i ⩽ N}, after running the K-means algorithm, let the obtained k temporary
The proposed modified symmetry similarity level operator
Given a set of data points, first the traditional K-means algorithm is used to obtain k temporary clusters. Next, we want to find the symmetrical line of each cluster by using the central moment technique (Gonzalez and Wood, 2002). The found symmetrical line will be used to measure the symmetry similarity level between two data points relative to that symmetrical line.
Suppose the given data set is covered by an h × w integer domain, the (p, q)th order moment is defined as
The proposed line symmetry-based K-means algorithm
In this section, we present the proposed line symmetry-based K-means (LSK) algorithm which extends the previous PSK algorithm by Su and Chou from handling the point symmetrical data set to handling the point symmetrical data set, the line symmetrical data set, or both of them.
The proposed LSK algorithm adopts the conventional K-means algorithm as a preprocessing step, then utilizes the concept of a major axis and the proposed MSSL operator to measure the symmetry level of the concerning two
Experimental results
In this section, several artificial and real data sets are used to demonstrate the feasibility and the extension capability of our proposed LSK algorithm. Experimental results reveal that our proposed LSK algorithm has encouraging results. Throughout the following experiments, the parameter ρ is selected to be five. In addition, the thresholds for MDSL and MOSL are selected to be 0.60 and 0.97, respectively.
Using the same experimental data set as in the PSK algorithm (Su and Chou, 2001), the
Conclusions
In this paper, we have presented the line symmetry-based K-means algorithm. The proposed new clustering algorithm not only can cluster data sets with the property of line symmetry successfully, but also preserves the clustering advantages in the previous PSK algorithm and the previous IPSK algorithm. Under some real data sets, experimental results demonstrate that the feasibility of our proposed line-symmetry based K-means algorithm and the relevant experimental results are rather encouraging.
References (16)
- et al.
Location and density based hierarchical clustering using similarity analysis
IEEE Trans. Pattern Anal. Machine Intel.
(1998) - Chung, K.L., Lin, J.S., in press. Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit.,...
- et al.
Bagging for path based clustering
IEEE Trans. Pattern Anal. Machine Intel.
(2003) - et al.
A new cluster isolation criterion based on dissimilarity increments
IEEE Trans. Pattern Anal. Machine Intel.
(2003) - et al.
Vector Quantization and Signal Compression
(1992) - et al.
Digital Image Processing
(2002) Clustering Algorithms
(1975)- et al.
Linear Algebra
(1961)
Cited by (0)
- 1
Supported in part by the National Science Council of ROC under contract NSC92-2213-E-011-079.