Abstract
Driven by the recent growth in the fields of internet of things (IoT) and deep neural networks (DNNs), DNN-powered IoT devices are expected to transform a variety of industrial applications. DNNs, however, involve many parameters and operations to process the data generated by IoT devices. This results in high data-processing latency and energy consumption. New approaches are thus being souhgt to tackle these issues and deploy real-time DNNs into resource-limited IoT devices. This paper presents a comprehensive review on hardware-and-software-co-design approaches developed to implement DNNs on low-resource hardware platforms. These approaches explore the trade-off between energy consumption, speed, classification accuracy, and model size. First, an overview of DNNs is given. Next, available tools for implementing DNNs on low-resource hardware platforms are described. Then, the memory hierarchy designs together with dataflow mapping strategies are presented. Furthermore, various model optimization approaches, including pruning and quantization, are discussed. In addition, case studies are given to demonstrate the feasibility of implementing DNNs for IoT applications. Finally, detailed discussions, research gaps, and future directions are provided. The presented review can guide the design and implementation of the next generation of hardware and software solutions for real-world IoT applications.
Similar content being viewed by others
References
Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960
Sodhro AH, Luo Z, Sodhro GH, Muzamal M, Rodrigues JJ, de Albuquerque VHC (2019) Artificial Intelligence based QoS optimization for multimedia communication in IoV systems. Future Gener Comput Syst 95:667–680
Evans D (2011) The internet of things: how the next evolution of the internet is changing everything. CISCO White Pap 1(2011):1–11
Sodhro AH, Shaikh FK, Pirbhulal S, Lodro MM, Shah MA (2017) Medical-QoS based telemedicine service selection using analytic hierarchy process. In: Handbook of large-scale distributed computing in smart healthcare. Springer, pp 589–609
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge
Li J, Zhang Y, Chen X, Xiang Y (2018) Secure attribute-based data sharing for resource-limited users in cloud computing. Comput Secur 72:1–12
Iandola F, Keutzer K (2017) Small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures. In: Proceedings of the twelfth IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis companion. ACM, p 1
Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Ndikumana A, Tran NH, Hong CS (2018) Deep learning based caching for self-driving car in multi-access edge computing. arXiv preprint arXiv:181001548
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Luo Z, Small A, Dugan L, Lane S (2018) Cloud chaser: real time deep learning computer vision on low computing power devices. arXiv preprint arXiv:181001069
Mozer TF (2017) Triggering video surveillance using embedded voice, speech, or sound recognition. Google Patents
Stergiou C, Psannis KE, Kim B-G, Gupta B (2018) Secure integration of IoT and cloud computing. Future Gener Comput Syst 78:964–975
Al-Garadi MA, Mohamed A, Al-Ali A, Du X, Guizani M (2018) A survey of machine and deep learning methods for Internet of Things (IoT) security. arXiv preprint arXiv:180711023
Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 129–137
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobiLeNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861
Sodhro AH, Pirbhulal S, de Albuquerque VHC (2019) Artificial intelligence driven mechanism for edge computing based industrial applications. IEEE Trans Ind Inf 15(7):4235–4243
Sodhro AH, Li Y, Shah MA (2016) Energy-efficient adaptive transmission power control for wireless body area networks. IET Commun 10(1):81–90
Chen Y-H, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138
Horowitz M (2014) Energy table for 45 nm process. Stanford VLSI wiki
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ ()2016 EIE: efficient inference engine on compressed deep neural network. In: ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254
Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th annual international symposium on field-programmable custom computing machines (FCCM). IEEE, pp 152–159
Guo K, Zeng S, Yu J, Wang Y, Yang H (2017) A survey of FPGA-based neural network accelerator. arXiv preprint arXiv:171208934
Abdelouahab K, Pelcat M, Serot J, Berry F (2018) Accelerating CNN inference on FPGAs: a survey. arXiv preprint arXiv:180601683
Wang E, Davis JJ, Zhao R, Ng H-C, Niu X, Luk W, Cheung PY, Constantinides GA (2019) Deep neural network approximation for custom hardware: where we’ve been, where we’re going. arXiv preprint arXiv:190106955
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35(1):126–136
Shawahna A, Sait SM, El-Maleh A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859
Venieris SI, Kouris A, Bouganis C-S (2018) Toolflows for mapping convolutional neural networks on fpgas: a survey and future directions. ACM Computing Surveys (CSUR) 51(3):56
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133. https://doi.org/10.1007/bf02478259
Guresen E, Kayakutlu G (2011) Definition of artificial neural networks with comparison to other networks. Procedia Comput Sci 3:426–433
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436. https://doi.org/10.1038/nature14539
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Nielsen MA (2015) Neural networks and deep learning, vol 25. Determination Press, San Francisco
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Ivakhnenko AG, Lapa VGE (1965) Cybernetic predicting devices. CCM Information Corporation, New York
Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. In: Proceedings of the 34th IEEE conference on decision and control. IEEE, Piscataway, pp 560–564
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A field guide to dynamical recurrent neural networks. IEEE Press
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114
Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 873–880
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken AP, Tejani A, Totz J, Wang Z (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, vol 3, p 4
Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 427–436
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-Net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Graves A, Mohamed A-R, Hinton G Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649
Socher R, Lin CC, Manning C, Ng AY (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 129–136
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, Abbeel P, Burgard W, Milford M (2018) The limits and potentials of deep learning for robotics. Int J Robot Res 37(4–5):405–420
Heaton J, Polson N, Witte JH (2017) Deep learning for finance: deep portfolios. Appl Stoch Models Bus Ind 33(1):3–12
Liu Y, Racah E, Correa J, Khosrowshahi A, Lavers D, Kunkel K, Wehner M, Collins W (2016) Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv preprint arXiv:160501156
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Zhang Q, Nian Wu Y, Zhu S-C (2018) Interpretable convolutional neural networks. In: Proc IEEE conference on computer vision and pattern recognition, pp 8827–8836
van Gerven M, Bohte S (2018) Artificial neural networks as models of neural information processing. Frontiers Media, Lausanne
Bajaj R (2016) Exploiting DSP block capabilities in FPGA high level design flows. Nanyang Technological University, Singapore
Vipin K, Fahmy SA (2018) FPGA dynamic and partial reconfiguration: a survey of architectures, methods, and applications. ACM Comput Surv (CSUR) 51(4):72
Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H (2018) Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst 37(1):35–47
Di Febbo P, Dal Mutto C, Tieu K, Mattoccia S (2018) KCNN: extremely-efficient hardware keypoint detection with a compact convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 682–690
Xilinx Artix-7 FPGA AC701 Evaluation Kit (2019) Xilinx Inc. https://www.xilinx.com/products/boards-and-kits/ek-a7-ac701-g.html. Accessed 16 Oct 2019
Wei L, Luo B, Li Y, Liu Y, Xu Q (2018) I know what you see: power side-channel attack on convolutional neural network accelerators. In: Proceedings of the 34th annual computer security applications conference. ACM, pp 393-406
Spartan-6 FPGA SP605 Evaluation Kit (2019) Xilinx Inc. https://www.xilinx.com/products/boards-and-kits/ek-s6-sp605-g.html#documentation. Accessed 16 Oct 2019
Kästner F, Janßen B, Kautz F, Hübner M, Corradi G (2018) Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 154–161
PYNQ-Z1: Python Productivity for Zynq-7000 ARM/FPGA SoC (2019) Digilent Inc. https://store.digilentinc.com/pynq-z1-python-productivity-for-zynq-7000-arm-fpga-soc/. Accessed 16 Oct 2019
Morcel R, Hajj H, Saghir MA, Akkary H, Artail H, Khanna R, Keshavamurthy A (2019) FeatherNet: an accelerated convolutional neural network design for resource-constrained FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 12(2):6
Cyclone V GT FPGA Development Board (2017) Altera. https://www.intel.com.au/content/dam/www/programmable/us/en/pdfs/literature/manual/rm_cvgt_fpga_dev_board.pdf. Accessed 16 Oct 2019
Venieris SI, Bouganis C-S (2017) fpgaConvNet: a toolflow for mapping diverse convolutional neural networks on embedded FPGAs. arXiv preprint arXiv:171108740
Xilinx Zynq-7000 SoC ZC706 Evaluation Kit (2018) Xilinx Inc. https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html#hardware. Accessed 16 Oct 2019
Brilli G, Burgio P, Bertogna M (2018) Convolutional neural networks on embedded automotive platforms: a qualitative comparison. In: International conference on high performance computing and simulation (HPCS). IEEE, pp 496–499
ZCU102 evaluation board user guide (2019) Xilinx Inc. https://www.xilinx.com/support/documentation/boards_and_kits/zcu102/ug1182-zcu102-eval-bd.pdf. Accessed 16 Oct 2019
ZCU104 Evaluation Board (2018) Xilinx Inc. https://www.xilinx.com/support/documentation/boards_and_kits/zcu104/ug1267-zcu104-eval-bd.pdf. Accessed 16 Oct 2019
ZCU106 Evaluation Board (2018) Xilinx Inc. https://www.xilinx.com/support/documentation/boards_and_kits/zcu106/ug1244-zcu106-eval-bd.pdf. Accessed 16 Oct 2019
Nazemi M, Pasandi G, Pedram M (2019) Energy-efficient, low-latency realization of neural networks through boolean logic minimization. In: Proceedings of the 24th Asia and South Pacific design automation conference. ACM, pp 274–279
Intel® Arria® 10 SoC Development Kit (2019) Intel Corporation. https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/arria-10-soc-development-kit.html. Accessed 16 Oct 2019
Colangelo P, Luebbers E, Huang R, Margala M, Nealis K (2017) Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA. In: IEEE high performance extreme computing conference (HPEC). IEEE, pp 1–7
Intel Arria 10 GX FPGA Development Kit (2019) Intel Corporation. https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-a10-gx-fpga.html. Accessed 16 Oct 2019
Omnitek (2019) Xilinx Inc. https://www.xilinx.com/products/acceleration-solutions/1-zz0jo0.html. Accessed 16 Oct 2019
Girija SS (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. Software available from tensorflow org
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:151201274
Redmon J (2013) Darknet: open source neural networks in C
Samajdar A, Zhu Y, Whatmough P, Mattina M, Krishna T (2018) SCALE-Sim: systolic CNN accelerator. arXiv preprint arXiv:181102883
Chen C, Liu X, Peng H, Ding H, Shi C-JR (2018) iFPNA: a flexible and efficient deep neural network accelerator with a programmable data flow engine in 28 nm CMOS. In: IEEE 44th European solid state circuits conference (ESSCIRC). IEEE, pp 170–173
Corporation I (2019) Stratix V product table. https://www.intel.com.au/content/dam/www/programmable/us/en/pdfs/literature/pt/stratix-v-product-table.pdf. Accessed 16 Oct 2019
Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: ACM SIGARCH computer architecture news, vol 2. ACM, pp 27–40
Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. IEEE Tran Comput Aided Des Integr Circuits Syst. https://doi.org/10.1109/TCAD.2017.2785257
Yang X, Gao M, Pu J, Nayak A, Liu Q, Bell SE, Setter JO, Cao K, Ha H, Kozyrakis C (2018) DNN dataflow choice is overrated. arXiv preprint arXiv:180904070
Li J, Yan G, Lu W, Jiang S, Gong S, Wu J, Li X (2018) SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: Design, automation and test in Europe conference and exhibition (DATE). IEEE, pp 343–348
Chen Y-H, Emer J, Sze V (2018) Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. arXiv preprint arXiv:180707928
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: ACM SIGARCH computer architecture news, vol 3. ACM, pp 92–104
Kwon H, Samajdar A, Krishna T (2018) MAERI: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. In: Proceedings of the twenty-third international conference on architectural support for programming languages and operating systems. ACM, pp 461–475
Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural network compression. arXiv preprint arXiv:170204008
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: International conference on computer vision (ICCV), vol 6
Huang Z, Wang N (2018) Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 304–320
Yang T-J, Chen Y-H, Sze V (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: IEEE conference on computer vision and pattern recognition (CVPR)
Hegde K, Yu J, Agrawal R, Yan M, Pellauer M, Fletcher CW (2018) UCNN: exploiting computational reuse in deep neural networks via weight repetition. arXiv preprint arXiv:180406508
Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: Proceedings of the 15th International conference on information processing in sensor networks. IEEE Press, p 23
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:150302531
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:160207360
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobiLeNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Zhang X, Zhou X, Lin M, Sun J (2018) ShuffLeNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:151000149
Horowitz M (2014) 1.1 Computing’s energy problem (and what we can do about it). In: IEEE international conference on solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 10–14
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems, pp 2074–2082
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4820–4828
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2017) Quantized neural networks: training neural networks with low precision weights and activations. J Mach Learn Res 18(1):6869–6898
Tang W, Hua G, Wang L (2017) How to train a compact binary neural network with high accuracy? In: AAAI, pp 2625–2631
Zhu C, Han S, Mao H, Dally WJ (2016) Trained ternary quantization. arXiv preprint arXiv:161201064
Manessi F, Rozza A, Bianco S, Napoletano P, Schettini R (2018) Automated pruning for deep neural network compression. In: 2018 24th International conference on pattern recognition (ICPR). IEEE, pp 657–664
Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):32
Li D, Wang X, Kong D (2018) Deeprebirth: accelerating deep neural network execution on mobile devices. In: Thirty-second AAAI conference on artificial intelligence
Louizos C, Ullrich K, Welling M (2017) Bayesian compression for deep learning. In: Advances in neural information processing systems, pp 3288–3298
Kingma DP, Salimans T, Welling M (2015) Variational dropout and the local reparameterization trick. In: Advances in neural information processing systems, pp 2575–2583
Kharitonov V, Molchanov D, Vetrov D (2018) Variational dropout via empirical bayes. arXiv preprint arXiv:181100596
Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2704–2713
Banner R, Hubara I, Hoffer E, Soudry D (2018) Scalable methods for 8-bit training of neural networks. In: Advances in neural information processing systems, pp 5151–5159
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131
Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, pp 65–74
Zhao R, Song W, Zhang W, Xing T, Lin J-H, Srivastava M, Gupta R, Zhang Z (2017) Accelerating binarized convolutional neural networks with software-programmable fpgas. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, pp 15–24
Darabi S, Belbahri M, Courbariaux M, Nia VP (2018) BNN+: improved binary network training. arXiv preprint arXiv:181211800
Gartner (2018) Gartner identifies the top 10 strategic technology trends for 2019. https://www.gartner.com/en/newsroom/press-releases/2018-10-15-gartner-identifies-the-top-10-strategic-technology-trends-for-2019. Accessed 16 Oct 2019
Sodhro AH, Pirbhulal S, Luo Z, de Albuquerque VHC (2019) Towards an optimal resource management for IoT based green and sustainable smart cities. J Clean Prod 220:1167–1179
Chandio AA, Zhu D, Sodhro AH (2014) Integration of inter-connectivity of information system (i3) using web services. arXiv preprint arXiv:14053724
Sodhro AH, Malokani AS, Sodhro GH, Muzammal M, Zongwei L (2019) An adaptive QoS computation for medical data processing in intelligent healthcare applications. Neural computing and applications, pp 1–12
Sodhro AH, Pirbhulal S, Sodhro GH, Gurtov A, Muzammal M, Luo Z (2018) A joint transmission power control and duty-cycle approach for smart healthcare system. IEEE Sens J 19(19):8479–8486
Wei X, Liu W, Chen L, Ma L, Chen H, Zhuang Y (2019) FPGA-based hybrid-type implementation of quantized neural networks for remote sensing applications. Sensors 19(4):924
Kang S, Lee J, Kim C, Yoo H-J (2018) B-Face: 0.2 mW CNN-based face recognition processor with face alignment for mobile user identification. In: IEEE symposium on VLSI circuits. IEEE, pp 137–138
Kueh SM, Kazmierski TJ (2018) Low-power and low-cost dedicated bit-serial hardware neural network for epileptic seizure prediction system. IEEE J Transl Eng Health Med 6:1–9
Gao C, Braun S, Kiselev I, Anumula J, Delbruck T, Liu S-C (2019) Real-time speech recognition for IoT purpose using a delta recurrent neural network accelerator. In: IEEE international symposium on circuits and systems (ISCAS). IEEE, pp 1–5
Li C-L, Huang Y-J, Cai Y-J, Han J, Zeng X-Y (2018) FPGA implementation of LSTM based on automatic speech recognition. In: 14th IEEE international conference on solid-state and integrated circuit technology (ICSICT). IEEE, pp 1–3
You X, Zhang C, Tan X, Jin S, Wu H (2019) AI for 5G: research directions and paradigms. Sci China Inf Sci 62(2):21301
Magsi H, Sodhro AH, Chachar FA, Abro SAK, Sodhro GH, Pirbhulal S (2018) Evolution of 5G in internet of medical things. In: International conference on computing, mathematics and engineering technologies (iCoMET). IEEE, pp 1–7
Lodro MM, Majeed N, Khuwaja AA, Sodhro AH, Greedy S (2018) Statistical channel modelling of 5G mmWave MIMO wireless communication. In: International conference on computing, mathematics and engineering technologies (iCoMET). IEEE, pp 1–5
Teerapittayanon S, McDanel B, Kung H (2017) Distributed deep neural networks over the cloud, the edge and end devices. In: IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 328–339
Michael Chui JM, Miremadi M, Henke N, Chung R, Nel P, Malhotra S (2018) Notes from the AI frontier: applications and value of deep learning. McKinsey & Company. https://www.mckinsey.com/featured-insights/artificial-intelligence/notes-from-the-ai-frontier-applications-and-value-of-deep-learning. Accessed 16 Oct 2019
Press G (2019) Artificial intelligence (AI) stats news: AI augmentation to create $2.9 trillion of business value. Forbes. https://www.forbes.com/sites/gilpress/2019/08/12/artificial-intelligence-ai-stats-news-ai-augmentation-to-create-2-9-trillion-of-business-value/#21cb849b63c2. Accessed 16 Oct 2019
MSV J (2019) Microsoft and Intel collaborate to simplify AI deployments at the edge. Forbes. https://www.forbes.com/sites/janakirammsv/2019/08/23/microsoft-and-intel-collaborate-to-simplify-ai-deployments-at-the-edge/#60ebb26f2a4b. Accessed 16 Oct 2019
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Z., Kouzani, A.Z. Implementation of DNNs on IoT devices. Neural Comput & Applic 32, 1327–1356 (2020). https://doi.org/10.1007/s00521-019-04550-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04550-w