Food image classification using local appearance and global structural information

doi:10.1016/j.neucom.2014.03.017

Neurocomputing

Volume 140, 22 September 2014, Pages 242-251

https://doi.org/10.1016/j.neucom.2014.03.017 Get rights and content

Abstract

This paper proposes food image classification methods exploiting both local appearance and global structural information of food objects. The contribution of the paper is threefold. First, non-redundant local binary pattern (NRLBP) is used to describe the local appearance information of food objects. Second, the structural information of food objects is represented by the spatial relationship between interest points and encoded using a shape context descriptor formed from those interest points. Third, we propose two methods of integrating appearance and structural information for the description and classification of food images. We evaluated the proposed methods on two datasets. Experimental results verified that the combination of local appearance and structural features can improve classification performance.

Introduction

The high incidence of obesity has been linked to an imbalanced food intake [1]. It is believed that a better understanding of the aetiology and effective health management programs should be developed through a better food-intake reporting system. Conventionally, this has been achieved manually through self-reporting or recording from observation. However, numerous studies have revealed that data obtained by these means seriously underestimates food intake, and thus does not accurately reflect the habitual eating behaviour of humans in real life [2], [3], [4].

Recently, image processing and pattern recognition techniques have been applied to improve the accuracy and efficiency of food intake reporting through automatic image-based food recognition systems [5]. In these systems, a comprehensive nutrition database is used to generate a daily food intake report for individuals based on computerised recognition of food images. Motivated by the importance of the health related issues associated with food intake and the progress made to date in the application of pattern recognition-based methods, this paper focusses on developing a food image recognition and classification method. Such a recognition and classification tool forms the core of a computerised food intake reporting system. We note that the problem of food recognition is not a simple test case of object recognition and this has been observed in a number of food recognition and classification research publications [6], [7], [8], [4]. Largely, this is because of the possible variations in appearance (colour, texture, and shape) and viewpoints of food images. The problem is also exacerbated by the complexity of the recording environment, e.g. uncontrolled photographing conditions and illumination conditions.

Generally speaking, the state-of-the-art methods for food image recognition and classification have used descriptors that mainly exploit appearance-based features including colour [6], texture [7] and shape [8], [4] in describing food objects. While several of these appearance-based descriptors have been successful in existing food image recognition and classification methods, structural information of food objects has been ignored. It would seem that structural information is as important as appearance information. Moreover, a combination of appearance-based features and structural features would enhance the recognition performance. In this paper, we propose to combine both local appearance and global structure in the description and classification of food images. The contributions of this paper are summarised as follows:

•
To take advantage of texture as a discriminative feature in describing the appearance information of food objects, we propose the use of non-redundant local binary pattern (NRLBP) to encode the local textures of food images.
•
In order to describe the structural information of food objects, we use the scale-invariant interest points [9] and employ a shape context descriptor [10] to encode the spatial relationship between interest points.
•
We propose two different methods to integrate both the local appearance and global structural information in describing and classifying food images.

The proposed methods were evaluated on two different datasets: the Pittsburgh Fast-Food Image (PFI) dataset [6] and a new dataset we collected with other food categories. Experimental results showed that combining both the local appearance and global structural information could enhance the classification accuracy and outperform the baselines [6] provided with the PFI dataset.

The rest of this paper is organised as follows. In Section 2 we provide a brief review of existing work on food image classification. Section 3 presents basic elements such as appearance-based features and structural features as well as how to combine those features in describing and classifying food images. Experimental results along with comparative analysis are presented in Section 4. Section 5 concludes the paper and discusses future work.

Section snippets

Related work

In food image classification, colour has been considered as one of the important features. For example, Chen et al. [6] employed a 4×4×4-bin RGB colour histogram (each bin corresponds to one of the components Red, Green, and Blue) to describe food images. Each pixel in the food image was then mapped to its closest bin in the histogram to generate a 64-dimensional feature vector representing that food image. The 64-dimensional feature vectors of all training food images were used to train a

Proposed food image classification

In this paper, we explore combining both local appearance and global structural information for enhancing the description and classification of food images. In particular, the SIFT detector [9] is used to detect interest points. Non-redundant local binary pattern (NRLBP) [12] is employed as the local textural descriptor and extracted at interest points to describe the appearance information of food objects. The topology of interest points represents the structural information of food objects

Experimental setup

In our experiments, LBP and NRLBP with M=8 and L=1 were employed to encode the appearance of food objects. In addition, uniform LBPs and NRLBPs were used to reduce the number of LBP/NRLBP histogram bins in which all the non-uniform LBPs/NRLBPs were cast into one bin. The dimension of the histogram was 59 for LBP and 30 for NRLBP. For the shape context descriptor, 5 and 12 bins were used respectively for r and θ; i.e. $| R | = 5$ and $| Θ | = 12$ .

The proposed methods were evaluated on two datasets: the

Conclusion

This paper proposes two methods for automatic classification of food images that can be used in a nutrition intake self-reporting system. Our proposed methods combine the appearance and structural information in the description and classification of food images. In particular, non-redundant local binary pattern (NRLBP) is employed as an appearance descriptor. The structural information of food objects is represented by the location of interest points and encoded using shape context. The

Duc Thanh Nguyen received his Bachelor in Information Technology from the University of Natural Sciences of Ho Chi Minh City, Vietnam, in 2002, Master Degree in Computer Science from the Asian Institute of Technology (AIT), Thailand, in 2005, and Ph.D. Degree in Computer Science from the University of Wollongong, Australia, in 2012. Currently, he is a researcher at the Information and Communication Technology (ICT) Research Institute, University of Wollongong, Australia. His research interests

References (22)

A. Kazaks
Obesityfood intake
Prim. Care: Clin. Office Pract.
(2003)
T. Ojala et al.
A comparative study of texture measures with classification based on featured distributions
Pattern Recognit.
(1996)
L. Lissner
Measuring food intake in studies of obesity
Public Health Nutr.
(2002)
N. Yao, R.J. Sclabassi, Q. Liu, J. Yang, J.D. Fernstrom, M.H. Fernstrom, M. Sun, A video processing approach to the...
W. Wu, J. Yang, Fast food recognition from videos of eating for calorie estimation, in: Proceedings of IEEE...
L. Yang, J. Yang, N. Zheng, H. Cheng, Layered object categorization, in: Proceedings of IEEE International Conference...
M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, J. Yang, PFID: Pittsburgh fast-food image dataset, in: Proceedings...
T. Joutou, K. Yanai, A food image recognition system with multiple kernel learning, in: Proceedings of IEEE...
D. Pishva, A. Kawai, T. Shiino, Shape based segmentation and colour distribution analysis with application to bread...
D.G. Lowe
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)

S. Belongie et al.

Shape matching and object recognition using shape contexts

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

Cited by (38)

Pattern classification based on regional models[Formula presented]
2022, Applied Soft Computing
Citation Excerpt :
A comprehensive study on different local classifiers is carried out in [28]. Furthermore, techniques based on local modeling have been widely explored in several computer vision applications involving various areas, such as food image classification [29], visual tracking [30], detecting pulmonary abnormalities in chest X-ray images [31], face recognition [32], rotary machine fault diagnosis [27], and classification of polarimetric Synthetic Aperture Radar (SAR) images [33]. Several local learning approaches rely on clustering methods to partition data into local regions.
In a supervised setting, the global classification paradigm leverages the whole training data to produce a single class discriminative model. Alternatively, the local classification approach builds multiple base classifiers, each of them using a small subset of the training data. In this paper, we take a path to stand in-between the global and local approaches. We introduce a two-level clustering-based method in which base classifiers operate on a larger portion of the input space than in the traditional local paradigm. In particular, we first obtain a grained input representation by employing a Self-Organizing Map (SOM) to the inputs. We then apply a clustering algorithm (e.g., K-Means) to the SOM units to define input regions — a subset of input samples associated with a specific cluster of SOM units. We refer to this approach as regional classification. We demonstrate the effectiveness of regional classification on several benchmarks. Also, we study the impact of (1) adopting linear and nonlinear base classifiers (e.g., least squares support vector machines) and (2) using cluster validation indexes to determine the optimal number of clusters. Based on the experiments, the regional classification approach achieves competitive performance compared to its global and local counterparts, especially when equipped with linear base classifiers.
Food object recognition using a mobile device: Evaluation of currently implemented systems
2020, Trends in Food Science and Technology
Citation Excerpt :
While mobile applications offered the ease of use and changed the amount of hardware needed for tracking of eating activity, the object recognition algorithms, on the other hand, could potentially automatically determine the type of food and the amount of it eaten. Descriptions of first food recognition algorithms (applying to single food instances or complex meals) are described in the papers (Jiménez, Jain, Ceres, & Pons, 1999; Matsuda, Hoashi, & Yanai, 2012; Matsuda & Yanai, 2012; Nguyen, Zong, Ogunbona, Probst, & Li, 2014; Quevedo, Carlos, Aguilera, & Cadoche, 2002; Ying, Jing, Tao, & Zhang, 2003). Through the evolution of mobile phones (which are now as powerful as personal computers were a few years ago) and the optimization of object recognition algorithms, the merging of both approaches was ready to happen.
Food object recognition systems present an attractive and useful research field since they enable objective measurements of eating activity. This feature is helpful and welcome in many dieting related instances, especially for managing health conditions or for analyzing eating patterns of research subjects.
We evaluate current food object recognition systems that were implemented on a mobile device. The evaluation was provided by analysing each particular system through its food recognition process. The whole recognition process was divided into 6 distinct stages: image acquisition, image processing, image segmentation, feature extraction, image classification, and volume estimation.
Through the analysis, the authors provide a categorization of mobile food recognition systems: recorder systems, suggester systems, and clinical responders. Each group is aimed at a different scenario which helps to identify features a particular system should focus its development on.
CNN-based features for retrieval and classification of food images
2018, Computer Vision and Image Understanding
Citation Excerpt :
Computer vision techniques can help to build systems to automatically locate and recognize diverse foods as well as to estimate the food quantity. Many works exist in the literature that exploit hand-crafted visual features for food recognition and quantity estimation both for desktop and for mobile applications (He et al., 2014; Nguyen et al., 2014; Bettadapura et al., 2015; Ciocca et al., 2015; Akpro Hippocrate et al., 2016; Pouladzadeh et al., 2016; Mezgec and Koroušić Seljak, 2017). Features learned by deep Convolutional Neural Networks (CNNs) have been recognized to be more robust and expressive than hand-crafted ones.
Features learned by deep Convolutional Neural Networks (CNNs) have been recognized to be more robust and expressive than hand-crafted ones. They have been successfully used in different computer vision tasks such as object detection, pattern recognition and image understanding. Given a CNN architecture and a training procedure, the efficacy of the learned features depends on the domain-representativeness of the training examples. In this paper we investigate the use of CNN-based features for the purpose of food recognition and retrieval. To this end, we first introduce the Food-475 database, that is the largest publicly available food database with 475 food classes and 247,636 images obtained by merging four publicly available food databases. We then define the food-domain representativeness of different food databases in terms of the total number of images, number of classes of the domain and number of examples for class. Different features are then extracted from a CNN based on the Residual Network with 50 layers architecture and trained on food databases with diverse food-domain representativeness. We evaluate these features for the tasks of food classification and retrieval. Results demonstrate that the features extracted from the Food-475 database outperform the other ones showing that we need larger food databases in order to tackle the challenges in food recognition, and that the created database is a step forward toward this end.
Muti-Stage Hierarchical Food Classification
2023, MADiMa 2023 - Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management, Co-located with: MM 2023
Attention-Based Convolutional Neural Network for Ingredients Identification
2023, Entropy
Small Convolutional Neural Network Trainer Designed through Transfer Learning in Dessert Classification
2023, Journal of Internet Technology

View all citing articles on Scopus

Philip Ogunbona received the Bachelor of Electronic and Electrical Engineering, Honours Class I, from the University of Ife Nigeria, and Ph.D., DIC in Electrical Engineering from Imperial College, London. He was a Senior Lecturer in the School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia before joining Motorola Research Lab, Sydney in 1998. He returned to the School of Computer Science and Software Engineering, University of Wollongong in 2004. He is currently a Professor and the Dean of the Faculty of Informatics, University of Wollongong. His research interests include computer vision, pattern recognition, signal and image processing. He has published more than 100 journal and conference scholarly publications. He is a Senior Member of the IEEE and a Fellow of the Australian Computer Society.

Yasmine Probst is an NHMRC Senior Research Fellow with the Smart Foods Centre, School of Medicine at the University of Wollongong and an Advanced Accredited Practising Dietitian. She is a present CI on an ARC LEIF grant and AI of a 5-year flagship project funded by the Illawarra Health and Medical Research Institute. In her research Yasmine works within the clinical trials research team to manage food-based intervention trials with a specific focus on dietary methodology, dietary modelling and food composition. Her research specifically focuses on nutrition informatics and food composition and its application to the field of dietetics. She completed her Ph.D. in 2006 researching the development, testing and implementation of an automated dietary assessment website for use in the primary healthcare setting and was awarded the DAA Joan Woodhill Award for Doctoral Research Excellence in 2007. Since completion of her Ph.D. Yasmine has been the recipient of two consecutive NHMRC research fellowships and CIA of an NHMRC development grant. She has also held the role of AI on a further NHMRC project grant. In 2009 Yasmine was competitively selected to complete the International Graduate Certificate Production and Use of Food Composition Data in Nutrition. Yasmine also teaches research methodology to undergraduate and postgraduate nutrition and dietetics students at the University of Wollongong, coordinates the visiting researcher program for Nutrition and Dietetics and supervises a number of higher degree research students. Dr. Probst has published over 30 peer reviewed articles, co-authored 7 book chapters, 2 books and 1 e-book. She is an active member of the Dietitians Association of Australia (DAA) within which she currently maintains the role of chairperson on the Health Informatics Advisory Committee, is member of the Practice and Evidence Based Nutrition Advisory Committee and the Social Media Advisory Committee and has been on the DAA NSW Branch Executive Committee 2005–2012 leading from 2007 to 2010 as the Executive Chairperson. Yasmine was a member of the Scientific Organising Committee for ICD 2012 and the chairperson in its Technology Satellite meeting. Yasmine is the present book review editor for the Journal of Nutrition and Dietetics and has presented her research at a number of national and international conferences.

Wanqing Li received his B.Sc. in physics and electronics in 1983, M.Sc. in computer science in 1987, both from Zhejiang University, China, and Ph.D. in electronic engineering in 1997 from The University of Western Australia. He was a Lecturer (1987–1990) and an Associate Professor (1991–1992) at the Department of Computer Science, Zhejiang University of China. He joint Motorola Lab in Sydney (1998–2003) as a Senior Researcher and later a Principal Researcher. From December 2007 to February 2008 and from January to February 2010, he was a visiting researcher at Microsoft Research, Redmond, WA. He is currently an Associate Professor and the Deputy Director of the Information and Communication Technology (ICT) Research Institute, University of Wollongong. His research interests include medical image processing and analysis, human motion analysis, audio and visual event detection and object recognition. Dr. Li has served as a Co-organiser of the International workshop on HAU3D׳11–12, publication chair of MMSP’08, General Co-Chair of ASIACCS’09 and DRMTICS׳05, and technical committee member for many international conferences including ICIP׳03-11. He is a regular reviewer of International Journal of Computer Vision, IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE Transactions on Neural Networks; IEEE Transactions on Circuits and Systems on Video Technology; IEEE Transactions on Multimedia, IEEE Signal Processing Letters and Computer Vision and Image Understanding.

View full text

Food image classification using local appearance and global structural information

Abstract

Introduction

Section snippets

Related work

Proposed food image classification

Experimental setup

Conclusion

Prim. Care: Clin. Office Pract.

Pattern Recognit.

Measuring food intake in studies of obesity

Public Health Nutr.

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

Shape matching and object recognition using shape contexts

IEEE Trans. Pattern Anal. Mach. Intell.