Elsevier

Applied Soft Computing

Volume 81, August 2019, 105489
Applied Soft Computing

On Improving the accuracy with Auto-Encoder on Conjunctivitis

https://doi.org/10.1016/j.asoc.2019.105489Get rights and content

Abstract

Applying the classification approach in machine learning to medical field is a promising direction as it could potentially save a large amount of medical resources and reduce the impact of error-prone subjective diagnosis. However, low accuracy is currently the biggest challenge for classification. So far many approaches have been developed to improve the classification performance and most of them are focusing on how to extend the layers or the nodes in the Neural Network (NN), or combining a classifier with the domain knowledge of the medical field. These extensions may improve the classification performance. However, these classifiers trained on one datasets may not be able to adapt to another dataset. Meanwhile, the layers and the nodes of the neural network cannot be extended infinitely in practice. To overcome these problems, in this paper, we propose an innovative approach which is to employ the Auto-Encoder (AE) model to improve the classification performance. Specifically, we make the best of the compression capability of the Encoder to generate the latent compressed vector which can be used to represent the original samples. Then, we use a regular classifier to perform classification on those compressed vectors instead of the original data. In addition, we explore the classification performance on different extracted features by enumerating the number of hidden nodes which are used to save the extracted features. Comprehensive experiments are conducted to validate our proposed approach with the medical dataset of conjunctivitis and the STL-10 dataset. The results show that our proposed AE-based model can not only improve the classification accuracy but also be beneficial to solve the problem of False Positive Rate.

Introduction

To become a qualified physician, one needs a lot of practice. However, diagnosing disease is still an extremely time-consuming and error-prone process even for an experienced physician. Thus, many researchers attempt to employ Artificial Intelligence models (e.g., classifiers) to diagnose the disease [1], [2], [3], [4]. Classification is a technique where a specific model is trained by using a set of samples with labels to identify which categories a new sample belongs to. Currently, one of the most popular classifiers is the Neural Network (NN) [5] which classifies the samples by capturing different features of the data as samples belonging to the same category usually have the same features. The NN model belongs to the Nature-Inspired community [6], and it has been widely applied to a variety of areas such as biology [7], eco-hydrological monitoring [8], web spam detection [9], traffic [10], [11], ecology [12]. However, there are still some fundamental issues needing to be addressed. For example, one of the biggest challenges is the low accuracy when the NN model is applied to some certain domains such as bio-mechanics [13], time series [14] and disease diagnosis [4]. Taking the conjunctivitis as an example, the traditional classifiers (e.g., Decision Tree [15], Random Forest [16], Naive Bayes [17] and K-Nearest Neighbors (KNN) [18]) classify samples using pixel-by-pixel distance measure to calculate the accuracy in the image dataset. However, these models belong to the linear classifier, and they could incorrectly view the fake sample as a correct one [19] if a healthy sample contains a variety of pixel-level artifacts. This is often the case because the healthy eyes could also contain the congestion which may be caused by insufficient sleep or influenza. Such the eyes could be close to the diseased ones but far from the healthy eyes when calculating their pixel-distance. If the classifier incorrectly recognizes a person without conjunctivitis (supposing the one got hypertension) as infected, it would deteriorate the hypertension as the medicine for treating conjunctivitis is usually adrenaline which could increase the blood pressure.

In general, most nature-inspired algorithms (e.g., genetic algorithm [20], particle swarm optimization [21] and ant colony algorithm [22]) are mainly focusing on the path planning or dispatch domain, only Neural Network (NN) has been mainly applied to the classification task. The NN model mimics the brain to analyze and process the information. One of the interesting characteristics for the NN model is the learning capability to unknown objects. It can automatically find out what kind of features are important, and it directly transforms the input into a prediction. Although the NN can also extract the features as the Auto-Encoder, they have different extraction strategy. Auto-Encoder belongs to unsupervised learning model while the NN belongs to supervised learning model. For the latter one, it usually utilizes the label information to learn the corresponding features which may or may not bu useful to the classification [23]. For example, the class ‘car’ and ‘ship’ hold different backgrounds (e.g., highway for ‘car’ and sea for ‘ship’) in the CIFAR10 dataset. Since they belong to different classes, the label may mark the background when the classifier extracts the background as the salient features such that the label would match the background rather than the real object. In this way, we may get a low accuracy. The detailed results are demonstrated in the Section experiments.

To address such a challenge, this paper explores the idea to apply the unsupervised learning model, the Auto-Encoder (AE) model, to improve the classification performance. The unsupervised learning model can remain the outline information of an object and removes redundant noise information. An Auto-Encoder is a type of artificial neural network, and it usually consists of two components, viz. an Encoder and a Decoder. The Encoder can encode the input into some latent compressed vectors and the input can also be reconstructed by those latent vectors with Decoder. The aim of AE is to extract the information from the data or reduce dimensionality of the data. Recently, it has been used to generate simulation data, especially for its extended versions, such as Variational Auto-Encoder (VAE) [24] and Adversarial Auto-Encoder (AAE) [25]. The key of Auto-Encoder is to define the number of nodes within the latent vector. In general, more hidden nodes in the neural network indicate more extracted features while fewer hidden nodes indicate insufficient information. However, the issue is that if the number of hidden nodes is very large, noise could be included which may decrease the classification performance. As for fewer hidden nodes, they cannot store all available information so the classification performance could be undesirable. Thus, it is essential to investigate how to determine the ideal number of hidden nodes. In this study, we adopt the Auto-Encoder to extract those features. We change the extracted features by tuning the number of hidden nodes in the last layer of the Encoder, and then we put the extracted features into a classifier to observe the classification performance. More details are shown in Section experiments.

The process of compression for the AE model is in fact the same as dimensionality reduction. There are many other dimensionality reduction methods such as Low Variance Filter (LVF) [26], Principal Component Analysis (PCA) [27], [28], Non-negative Matrix Factorization (NMF) [29], Backward Feature Elimination (BFE) [30], Forward Feature Construction (FFC) [31]. However, when we apply these methods to classification, their performance is not satisfactory (more details are shown in Section 5). Specifically, in LVF [26], a threshold is always set as the filter to keep some columns in which their data have rich information (or the change for data is very large), and remove other columns. However, it is difficult to ensure that columns with small changes do not play an important role in the original dataset. The principle of PCA [27], [28] is to map the n dimensional characteristics into the k-dimensionality (k<n) instead of simply removing some columns. It uses an orthogonal transformation to convert a set of possibly correlated variables into a set of linearly uncorrelated variables. The important question here is that does it keep the largest entropy of information if we decompose the covariance matrix? The question is still unknown. Also, the principal components with small contribution may contain important information about the differences among samples. NMF [29] factorizes the original matrix V (m × n) into two matrices W (weight matrix, m × k) and H (characteristic matrix, k × n), with k<m and k<n. The challenges of NMF are: (1) the fitted results for NMF are inconsistent and these results could be unsatisfactory if we set many topics, and in such a case it is hard to achieve the global minimization; (2) the information after reducing dimensionality is lineally related to inputs, which can weaken the representation of non-lineal input data; (3). the amount of dimensions is fixed in NMF, which could result in inflexibility in the process of reducing dimensionality. As for the BFE and FFC, they are very time-consuming and hence less used in practice. The Auto-Encoder can change the dimensions by tuning the amount of hidden nodes. Both the Encoder and the Decoder adopt the NN framework which can address the non-lineal data. Thus, in this paper, we choose the Auto-Encoder to extract the features. In addition, some researchers [32] investigated that various loss functions within a neural network can affect the classification performance. Therefore, in this study, we also employ different functions from [33], [34], [35] and apply them into the regular CNN to explore their impact on classification performance.

The major contributions in this paper are as follows:

  • This paper investigates the idea of improving the classification performance using Auto-Encoder model. In particular, we apply the Auto-Encoder model to improve the performance of classification on conjunctivitis.

  • To the best of our knowledge, this paper proposes for the first time the focus on changing the number of hidden nodes to improve the classification performance.

  • We investigate the impact on classification performance with different loss functions for a neural network.

  • Extensive experiments have been conducted to demonstrate the effectiveness of our proposed idea.

The remainder of is paper is organized as follows. In Section 2 we discuss some related work. The preliminary about the Auto-Encoder model is introduced in Section 3 and then our idea of applying Auto-Encoder to classification is introduced in Section 4. Section 5 demonstrates the experimental results. Finally, Section 6 concludes this paper.

Section snippets

Related work

In machine learning, the classifiers are usually divided into two categories, linear classifiers and non-linear classifiers. The linear classifiers include the Perceptron [36], Linear Discriminant Analysis (LDA) [37], Quadratic Discriminant Analysis (QDA) [38], SVM (linear kernel) [39], etc. While Perception, LDA and SVM are widely used in practice, QDA may not be very popular. In most cases, the classifier grouped samples in a hyperplane, however, QDA is used to separate objects or events by a

The auto-encoder model

The Auto-Encoder (AE) [52] belongs to the feed-forward neural network model, it consists of two main components, an Encoder and a Decoder. The Encoder is used to encode or compress the input to form a latent compressed vector. After that, we put this vector into the Decoder to decode or uncompress to reconstruct the original input. In other words, the same input would generate the same output. The Encoder and Decoder can be defined as transitions ϕ and ψ: ϕ,ψ=argminϕ,ψX(ϕψ)X2which ϕ:χ F

The proposed classification method

In this section, we mainly present our classification method using the Auto-Encoder (AE) model to improve the classification performance. Specifically, we first train the AE model with the latent compressed vectors encoded by the Encoder; then we feed those vectors into a classifier for training. After that, we put the test data, after encoding, into this classifier to calculate the accuracy. (the process of classification is shown in Fig. 1). Meanwhile, since our idea involves replacing the

Experiments

We validate our approach on the medical dataset, the conjunctivitis dataset. The conjunctivitis dataset is also a supervised dataset, it consists of three types of images, complete health (H), health with a more or less red color (HR) in conjunctiva, and conjunctivitis (C). Since our goal is to diagnose which one is healthy and which one is not, we group the conjunctivitis dataset into two groups, the health (it includes the first two types of images) and the patients. We manually label the

Conclusion

In this paper, to improve the classification accuracy, we have proposed the AE-based classification method, and investigated the effect on classification accuracy of Convolutional Neural Network with different loss functions. As only a few studies have discussed how to determine the number of hidden nodes within the latent compressed vector, we have explored the robustness of different number of hidden nodes on classification performance. Moreover, our classification model can effectively

Acknowledgments

The authors would like to acknowledge the support provided by the National Key R&D Program of China (No.2018YFC1604000), the Fundamental Research Funds for the Central Universities of China (2042017gf0035), the grands of the National Natural Science Foundation of China (61572374,U163620068, U1135005, 61572371), Open Fund of Key Laboratory of Network Assessment Technology from CAS, Guangxi Key Laboratory of Trusted Software, China (No. kx201607), the Academic Team Building Plan for Young

Wei Li is a Ph.D. candidate of Wuhan University. He had visited the University of Massachusetts Boston, and The Hong Kong Polytechnic University as visiting scholar.

References (62)

  • MercanC. et al.

    Multi-instance multi-label learning for multi-class classification of whole slide breast histopathology images

    IEEE Trans. Med. Imaging

    (2017)
  • FalcowalterJ.J. et al.

    The new definition and classification of seizures and epilepsy

    Epilepsy Res.

    (2017)
  • DayhoffJudith E. et al.

    Artificial neural networks

    Cancer

    (2001)
  • YangXin-She

    Nature-inspired Metaheuristic Algorithms

    (2010)
  • SinhaR. Oset et al.

    New techniques for classification of multigerms

    Topol. Appl.

    (2018)
  • McmanamayRyan A. et al.

    Updating the us hydrologic classification: an approach to clustering and stratifying ecohydrologic data

    Ecohydrology

    (2014)
  • LiYuancheng et al.

    Web spam classification method based on deep belief networks

    Expert Syst. Appl.

    (2017)
  • AcetoGiuseppe et al.

    Multi-classification approaches for classifying mobile app traffic

    J. Netw. Comput. Appl.

    (2017)
  • WangWen-chuan et al.

    Improving forecasting accuracy of annual runoff time series using arima based on eemd decomposition

    Water Resour. Manage.

    (2015)
  • HoTin Kam

    Random decision forests

  • HandDavid J. et al.

    Idiot’s bayesnot so stupid after all?

    Internat. Statist. Rev.

    (2001)
  • AltmanN.S.

    An introduction to kernel and nearest-neighbor nonparametric regression

    Amer. Stat.

    (1992)
  • SuJiawei et al.

    One Pixel Attack for Fooling Deep Neural Networks

    (2017)
  • WhitleyDarrell

    A genetic algorithm tutorial

    Stat. Comput.

    (1994)
  • KennedyJames

    Particle swarm optimization

  • WenYandong et al.

    A discriminative feature learning approach for deep face recognition

  • KingmaDiederik P. et al.

    Auto-Encoding Variational Bayes

    (2013)
  • MakhzaniAlireza et al.

    Adversarial autoencoders

    Comput. Sci.

    (2015)
  • WoodL.B. et al.

    Low variance adaptive filter for cancelling motion artifact in wearable photoplethysmogram sensor signals

  • JolliffeI.T.

    Principal component analysis

    J. Mark. Res.

    (2002)
  • LeeDaniel D. et al.

    Algorithms for non-negative matrix factorization

  • Cited by (43)

    • Using convolutional neural networks for corneal arcus detection towards familial hypercholesterolemia screening

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      Convolutional neural networks (CNN) constitute a valuable method in visual recognition and are also widely applied in the medical field. They are widely appreciated in a variety of applications including breast tumor detection (Rouhi et al., 2015), lung image patches (Li et al., 2014; Ciompi et al., 2015), skin cancer classification (Esteva et al., 2017), radiology workflow triage (Titano et al., 2018), ECG analysis (Khamis et al., 2018) or improving the accuracy of classification of conjunctivitis (Li et al., 2019). Moreover, neural networks proved to be very useful in terms of organs segmentation.

    • NIA-Network: Towards improving lung CT infection detection for COVID-19 diagnosis

      2021, Artificial Intelligence in Medicine
      Citation Excerpt :

      However, the COVID-19 is an outbreak with large scale and spreads by droplet transmission and fomite transmission due to the characteristics of COVID-19 with high infection [11]. Under such scenarios, physicians give their attention to diagnose and treat patients, and have no extra time to label a large number of CT images which is a requirement of successfully training a deep learning diagnosis model with supervised learning strategy [12–17]. The semi-supervised learning strategy [18] may help train a powerful machine learning model with a small amount of labeled CT images and a large amount of unlabeled CT images, and the state-of-the-art semi-supervised object detection models (e.g., Domain adaptive Faster R-CNN [19] and Few-Shot Adaptive Faster R-CNN [20]) have been validated on the public scenery datasets.

    View all citing articles on Scopus

    Wei Li is a Ph.D. candidate of Wuhan University. He had visited the University of Massachusetts Boston, and The Hong Kong Polytechnic University as visiting scholar.

    Xiao Liu received his Ph.D. degree in Computer Science and Software Engineering from the Faculty of Information and Communication Technologies at Swinburne University of Technology, Melbourne, Australia in 2011. He is currently a Senior Lecturer at School of Information Technology, Deakin University, Melbourne, Australia.

    Jin Liu is a profess or in the State Key Laboratory of Software Engineering, Computer School, Wuhan University. His research interests include Software Engineering and interactive collaboration on the Web. His work has been published in several International journals including CCPE and IEEE TSE.

    Ping Chen received his Ph.D. degree in Information Technology from the George Mason University, and he is an Associate Professor of Department of Engineering in the University of Massachusetts Boston.

    Shaohua Wan (SM’19) received his Ph.D. degree from the School of Computer, Wuhan University, in 2010. Since 2015, he held a post-doctoral position with the State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology. From 2016 to 2017, he was a visiting professor at the Department of Electrical and Computer Engineering, Technical University of Munich, Germany. He is currently an Associate Professor at the School of Information and Safety Engineering, Zhongnan University of Economics and Law. His research interests include massive data computing for Internet of Things and edge computing.

    Xiaohui Cui received his Ph.D. degree in Computer Science from the University of Louisville in 2004, and he is a Professor of Computer Science in Wuhan University.

    View full text