Skip to main content

Advertisement

Log in

Implementation of DNNs on IoT devices

  • Review Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Driven by the recent growth in the fields of internet of things (IoT) and deep neural networks (DNNs), DNN-powered IoT devices are expected to transform a variety of industrial applications. DNNs, however, involve many parameters and operations to process the data generated by IoT devices. This results in high data-processing latency and energy consumption. New approaches are thus being souhgt to tackle these issues and deploy real-time DNNs into resource-limited IoT devices. This paper presents a comprehensive review on hardware-and-software-co-design approaches developed to implement DNNs on low-resource hardware platforms. These approaches explore the trade-off between energy consumption, speed, classification accuracy, and model size. First, an overview of DNNs is given. Next, available tools for implementing DNNs on low-resource hardware platforms are described. Then, the memory hierarchy designs together with dataflow mapping strategies are presented. Furthermore, various model optimization approaches, including pruning and quantization, are discussed. In addition, case studies are given to demonstrate the feasibility of implementing DNNs for IoT applications. Finally, detailed discussions, research gaps, and future directions are provided. The presented review can guide the design and implementation of the next generation of hardware and software solutions for real-world IoT applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960

    Article  Google Scholar 

  2. Sodhro AH, Luo Z, Sodhro GH, Muzamal M, Rodrigues JJ, de Albuquerque VHC (2019) Artificial Intelligence based QoS optimization for multimedia communication in IoV systems. Future Gener Comput Syst 95:667–680

    Article  Google Scholar 

  3. Evans D (2011) The internet of things: how the next evolution of the internet is changing everything. CISCO White Pap 1(2011):1–11

    Google Scholar 

  4. Sodhro AH, Shaikh FK, Pirbhulal S, Lodro MM, Shah MA (2017) Medical-QoS based telemedicine service selection using analytic hierarchy process. In: Handbook of large-scale distributed computing in smart healthcare. Springer, pp 589–609

  5. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1

    Article  Google Scholar 

  6. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  7. Li J, Zhang Y, Chen X, Xiang Y (2018) Secure attribute-based data sharing for resource-limited users in cloud computing. Comput Secur 72:1–12

    Article  Google Scholar 

  8. Iandola F, Keutzer K (2017) Small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures. In: Proceedings of the twelfth IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis companion. ACM, p 1

  9. Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329

    Article  Google Scholar 

  10. Ndikumana A, Tran NH, Hong CS (2018) Deep learning based caching for self-driving car in multi-access edge computing. arXiv preprint arXiv:181001548

  11. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  12. Luo Z, Small A, Dugan L, Lane S (2018) Cloud chaser: real time deep learning computer vision on low computing power devices. arXiv preprint arXiv:181001069

  13. Mozer TF (2017) Triggering video surveillance using embedded voice, speech, or sound recognition. Google Patents

  14. Stergiou C, Psannis KE, Kim B-G, Gupta B (2018) Secure integration of IoT and cloud computing. Future Gener Comput Syst 78:964–975

    Article  Google Scholar 

  15. Al-Garadi MA, Mohamed A, Al-Ali A, Du X, Guizani M (2018) A survey of machine and deep learning methods for Internet of Things (IoT) security. arXiv preprint arXiv:180711023

  16. Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 129–137

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  18. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobiLeNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861

  19. Sodhro AH, Pirbhulal S, de Albuquerque VHC (2019) Artificial intelligence driven mechanism for edge computing based industrial applications. IEEE Trans Ind Inf 15(7):4235–4243

    Article  Google Scholar 

  20. Sodhro AH, Li Y, Shah MA (2016) Energy-efficient adaptive transmission power control for wireless body area networks. IET Commun 10(1):81–90

    Article  Google Scholar 

  21. Chen Y-H, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138

    Article  Google Scholar 

  22. Horowitz M (2014) Energy table for 45 nm process. Stanford VLSI wiki

  23. Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ ()2016 EIE: efficient inference engine on compressed deep neural network. In: ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254

  24. Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th annual international symposium on field-programmable custom computing machines (FCCM). IEEE, pp 152–159

  25. Guo K, Zeng S, Yu J, Wang Y, Yang H (2017) A survey of FPGA-based neural network accelerator. arXiv preprint arXiv:171208934

  26. Abdelouahab K, Pelcat M, Serot J, Berry F (2018) Accelerating CNN inference on FPGAs: a survey. arXiv preprint arXiv:180601683

  27. Wang E, Davis JJ, Zhao R, Ng H-C, Niu X, Luk W, Cheung PY, Constantinides GA (2019) Deep neural network approximation for custom hardware: where we’ve been, where we’re going. arXiv preprint arXiv:190106955

  28. Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35(1):126–136

    Article  Google Scholar 

  29. Shawahna A, Sait SM, El-Maleh A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859

    Article  Google Scholar 

  30. Venieris SI, Kouris A, Bouganis C-S (2018) Toolflows for mapping convolutional neural networks on fpgas: a survey and future directions. ACM Computing Surveys (CSUR) 51(3):56

    Article  Google Scholar 

  31. McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133. https://doi.org/10.1007/bf02478259

    Article  MathSciNet  MATH  Google Scholar 

  32. Guresen E, Kayakutlu G (2011) Definition of artificial neural networks with comparison to other networks. Procedia Comput Sci 3:426–433

    Article  Google Scholar 

  33. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  Google Scholar 

  34. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  35. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  36. Nielsen MA (2015) Neural networks and deep learning, vol 25. Determination Press, San Francisco

    Google Scholar 

  37. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003

    Article  Google Scholar 

  38. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  39. Ivakhnenko AG, Lapa VGE (1965) Cybernetic predicting devices. CCM Information Corporation, New York

    Google Scholar 

  40. Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. In: Proceedings of the 34th IEEE conference on decision and control. IEEE, Piscataway, pp 560–564

  41. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A field guide to dynamical recurrent neural networks. IEEE Press

  42. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  43. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696

  44. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  45. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114

  46. Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 873–880

  47. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  48. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  49. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken AP, Tejani A, Totz J, Wang Z (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, vol 3, p 4

  50. Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 427–436

  51. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-Net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542

  52. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  53. Graves A, Mohamed A-R, Hinton G Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649

  54. Socher R, Lin CC, Manning C, Ng AY (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 129–136

  55. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88

    Article  Google Scholar 

  56. Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, Abbeel P, Burgard W, Milford M (2018) The limits and potentials of deep learning for robotics. Int J Robot Res 37(4–5):405–420

    Article  Google Scholar 

  57. Heaton J, Polson N, Witte JH (2017) Deep learning for finance: deep portfolios. Appl Stoch Models Bus Ind 33(1):3–12

    Article  MathSciNet  Google Scholar 

  58. Liu Y, Racah E, Correa J, Khosrowshahi A, Lavers D, Kunkel K, Wehner M, Collins W (2016) Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv preprint arXiv:160501156

  59. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  60. Zhang Q, Nian Wu Y, Zhu S-C (2018) Interpretable convolutional neural networks. In: Proc IEEE conference on computer vision and pattern recognition, pp 8827–8836

  61. van Gerven M, Bohte S (2018) Artificial neural networks as models of neural information processing. Frontiers Media, Lausanne

    Book  Google Scholar 

  62. Bajaj R (2016) Exploiting DSP block capabilities in FPGA high level design flows. Nanyang Technological University, Singapore

    Google Scholar 

  63. Vipin K, Fahmy SA (2018) FPGA dynamic and partial reconfiguration: a survey of architectures, methods, and applications. ACM Comput Surv (CSUR) 51(4):72

    Article  Google Scholar 

  64. Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H (2018) Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Trans Comput Aided Des Integr Circuits Syst 37(1):35–47

    Article  Google Scholar 

  65. Di Febbo P, Dal Mutto C, Tieu K, Mattoccia S (2018) KCNN: extremely-efficient hardware keypoint detection with a compact convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 682–690

  66. Xilinx Artix-7 FPGA AC701 Evaluation Kit (2019) Xilinx Inc. https://www.xilinx.com/products/boards-and-kits/ek-a7-ac701-g.html. Accessed 16 Oct 2019

  67. Wei L, Luo B, Li Y, Liu Y, Xu Q (2018) I know what you see: power side-channel attack on convolutional neural network accelerators. In: Proceedings of the 34th annual computer security applications conference. ACM, pp 393-406

  68. Spartan-6 FPGA SP605 Evaluation Kit (2019) Xilinx Inc. https://www.xilinx.com/products/boards-and-kits/ek-s6-sp605-g.html#documentation. Accessed 16 Oct 2019

  69. Kästner F, Janßen B, Kautz F, Hübner M, Corradi G (2018) Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ. In: IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 154–161

  70. PYNQ-Z1: Python Productivity for Zynq-7000 ARM/FPGA SoC (2019) Digilent Inc. https://store.digilentinc.com/pynq-z1-python-productivity-for-zynq-7000-arm-fpga-soc/. Accessed 16 Oct 2019

  71. Morcel R, Hajj H, Saghir MA, Akkary H, Artail H, Khanna R, Keshavamurthy A (2019) FeatherNet: an accelerated convolutional neural network design for resource-constrained FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 12(2):6

    Google Scholar 

  72. Cyclone V GT FPGA Development Board (2017) Altera. https://www.intel.com.au/content/dam/www/programmable/us/en/pdfs/literature/manual/rm_cvgt_fpga_dev_board.pdf. Accessed 16 Oct 2019

  73. Venieris SI, Bouganis C-S (2017) fpgaConvNet: a toolflow for mapping diverse convolutional neural networks on embedded FPGAs. arXiv preprint arXiv:171108740

  74. Xilinx Zynq-7000 SoC ZC706 Evaluation Kit (2018) Xilinx Inc. https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html#hardware. Accessed 16 Oct 2019

  75. Brilli G, Burgio P, Bertogna M (2018) Convolutional neural networks on embedded automotive platforms: a qualitative comparison. In: International conference on high performance computing and simulation (HPCS). IEEE, pp 496–499

  76. ZCU102 evaluation board user guide (2019) Xilinx Inc. https://www.xilinx.com/support/documentation/boards_and_kits/zcu102/ug1182-zcu102-eval-bd.pdf. Accessed 16 Oct 2019

  77. ZCU104 Evaluation Board (2018) Xilinx Inc. https://www.xilinx.com/support/documentation/boards_and_kits/zcu104/ug1267-zcu104-eval-bd.pdf. Accessed 16 Oct 2019

  78. ZCU106 Evaluation Board (2018) Xilinx Inc. https://www.xilinx.com/support/documentation/boards_and_kits/zcu106/ug1244-zcu106-eval-bd.pdf. Accessed 16 Oct 2019

  79. Nazemi M, Pasandi G, Pedram M (2019) Energy-efficient, low-latency realization of neural networks through boolean logic minimization. In: Proceedings of the 24th Asia and South Pacific design automation conference. ACM, pp 274–279

  80. Intel® Arria® 10 SoC Development Kit (2019) Intel Corporation. https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/arria-10-soc-development-kit.html. Accessed 16 Oct 2019

  81. Colangelo P, Luebbers E, Huang R, Margala M, Nealis K (2017) Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA. In: IEEE high performance extreme computing conference (HPEC). IEEE, pp 1–7

  82. Intel Arria 10 GX FPGA Development Kit (2019) Intel Corporation. https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-a10-gx-fpga.html. Accessed 16 Oct 2019

  83. Omnitek (2019) Xilinx Inc. https://www.xilinx.com/products/acceleration-solutions/1-zz0jo0.html. Accessed 16 Oct 2019

  84. Girija SS (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. Software available from tensorflow org

  85. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  86. Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:151201274

  87. Redmon J (2013) Darknet: open source neural networks in C

  88. Samajdar A, Zhu Y, Whatmough P, Mattina M, Krishna T (2018) SCALE-Sim: systolic CNN accelerator. arXiv preprint arXiv:181102883

  89. Chen C, Liu X, Peng H, Ding H, Shi C-JR (2018) iFPNA: a flexible and efficient deep neural network accelerator with a programmable data flow engine in 28 nm CMOS. In: IEEE 44th European solid state circuits conference (ESSCIRC). IEEE, pp 170–173

  90. Corporation I (2019) Stratix V product table. https://www.intel.com.au/content/dam/www/programmable/us/en/pdfs/literature/pt/stratix-v-product-table.pdf. Accessed 16 Oct 2019

  91. Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: ACM SIGARCH computer architecture news, vol 2. ACM, pp 27–40

  92. Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. IEEE Tran Comput Aided Des Integr Circuits Syst. https://doi.org/10.1109/TCAD.2017.2785257

    Article  Google Scholar 

  93. Yang X, Gao M, Pu J, Nayak A, Liu Q, Bell SE, Setter JO, Cao K, Ha H, Kozyrakis C (2018) DNN dataflow choice is overrated. arXiv preprint arXiv:180904070

  94. Li J, Yan G, Lu W, Jiang S, Gong S, Wu J, Li X (2018) SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: Design, automation and test in Europe conference and exhibition (DATE). IEEE, pp 343–348

  95. Chen Y-H, Emer J, Sze V (2018) Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. arXiv preprint arXiv:180707928

  96. Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) ShiDianNao: shifting vision processing closer to the sensor. In: ACM SIGARCH computer architecture news, vol 3. ACM, pp 92–104

  97. Kwon H, Samajdar A, Krishna T (2018) MAERI: enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. In: Proceedings of the twenty-third international conference on architectural support for programming languages and operating systems. ACM, pp 461–475

  98. Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural network compression. arXiv preprint arXiv:170204008

  99. He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: International conference on computer vision (ICCV), vol 6

  100. Huang Z, Wang N (2018) Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 304–320

    Chapter  Google Scholar 

  101. Yang T-J, Chen Y-H, Sze V (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: IEEE conference on computer vision and pattern recognition (CVPR)

  102. Hegde K, Yu J, Agrawal R, Yan M, Pellauer M, Fletcher CW (2018) UCNN: exploiting computational reuse in deep neural networks via weight repetition. arXiv preprint arXiv:180406508

  103. Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: Proceedings of the 15th International conference on information processing in sensor networks. IEEE Press, p 23

  104. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:150302531

  105. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  106. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  107. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  108. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:160207360

  109. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobiLeNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  110. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffLeNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  111. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:151000149

  112. Horowitz M (2014) 1.1 Computing’s energy problem (and what we can do about it). In: IEEE international conference on solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 10–14

  113. Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems, pp 2074–2082

  114. Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4820–4828

  115. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2017) Quantized neural networks: training neural networks with low precision weights and activations. J Mach Learn Res 18(1):6869–6898

    MathSciNet  MATH  Google Scholar 

  116. Tang W, Hua G, Wang L (2017) How to train a compact binary neural network with high accuracy? In: AAAI, pp 2625–2631

  117. Zhu C, Han S, Mao H, Dally WJ (2016) Trained ternary quantization. arXiv preprint arXiv:161201064

  118. Manessi F, Rozza A, Bianco S, Napoletano P, Schettini R (2018) Automated pruning for deep neural network compression. In: 2018 24th International conference on pattern recognition (ICPR). IEEE, pp 657–664

  119. Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):32

    Google Scholar 

  120. Li D, Wang X, Kong D (2018) Deeprebirth: accelerating deep neural network execution on mobile devices. In: Thirty-second AAAI conference on artificial intelligence

  121. Louizos C, Ullrich K, Welling M (2017) Bayesian compression for deep learning. In: Advances in neural information processing systems, pp 3288–3298

  122. Kingma DP, Salimans T, Welling M (2015) Variational dropout and the local reparameterization trick. In: Advances in neural information processing systems, pp 2575–2583

  123. Kharitonov V, Molchanov D, Vetrov D (2018) Variational dropout via empirical bayes. arXiv preprint arXiv:181100596

  124. Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2704–2713

  125. Banner R, Hubara I, Hoffer E, Soudry D (2018) Scalable methods for 8-bit training of neural networks. In: Advances in neural information processing systems, pp 5151–5159

  126. Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131

  127. Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, pp 65–74

  128. Zhao R, Song W, Zhang W, Xing T, Lin J-H, Srivastava M, Gupta R, Zhang Z (2017) Accelerating binarized convolutional neural networks with software-programmable fpgas. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, pp 15–24

  129. Darabi S, Belbahri M, Courbariaux M, Nia VP (2018) BNN+: improved binary network training. arXiv preprint arXiv:181211800

  130. Gartner (2018) Gartner identifies the top 10 strategic technology trends for 2019. https://www.gartner.com/en/newsroom/press-releases/2018-10-15-gartner-identifies-the-top-10-strategic-technology-trends-for-2019. Accessed 16 Oct 2019

  131. Sodhro AH, Pirbhulal S, Luo Z, de Albuquerque VHC (2019) Towards an optimal resource management for IoT based green and sustainable smart cities. J Clean Prod 220:1167–1179

    Article  Google Scholar 

  132. Chandio AA, Zhu D, Sodhro AH (2014) Integration of inter-connectivity of information system (i3) using web services. arXiv preprint arXiv:14053724

  133. Sodhro AH, Malokani AS, Sodhro GH, Muzammal M, Zongwei L (2019) An adaptive QoS computation for medical data processing in intelligent healthcare applications. Neural computing and applications, pp 1–12

  134. Sodhro AH, Pirbhulal S, Sodhro GH, Gurtov A, Muzammal M, Luo Z (2018) A joint transmission power control and duty-cycle approach for smart healthcare system. IEEE Sens J 19(19):8479–8486

    Article  Google Scholar 

  135. Wei X, Liu W, Chen L, Ma L, Chen H, Zhuang Y (2019) FPGA-based hybrid-type implementation of quantized neural networks for remote sensing applications. Sensors 19(4):924

    Article  Google Scholar 

  136. Kang S, Lee J, Kim C, Yoo H-J (2018) B-Face: 0.2 mW CNN-based face recognition processor with face alignment for mobile user identification. In: IEEE symposium on VLSI circuits. IEEE, pp 137–138

  137. Kueh SM, Kazmierski TJ (2018) Low-power and low-cost dedicated bit-serial hardware neural network for epileptic seizure prediction system. IEEE J Transl Eng Health Med 6:1–9

    Article  Google Scholar 

  138. Gao C, Braun S, Kiselev I, Anumula J, Delbruck T, Liu S-C (2019) Real-time speech recognition for IoT purpose using a delta recurrent neural network accelerator. In: IEEE international symposium on circuits and systems (ISCAS). IEEE, pp 1–5

  139. Li C-L, Huang Y-J, Cai Y-J, Han J, Zeng X-Y (2018) FPGA implementation of LSTM based on automatic speech recognition. In: 14th IEEE international conference on solid-state and integrated circuit technology (ICSICT). IEEE, pp 1–3

  140. You X, Zhang C, Tan X, Jin S, Wu H (2019) AI for 5G: research directions and paradigms. Sci China Inf Sci 62(2):21301

    Article  Google Scholar 

  141. Magsi H, Sodhro AH, Chachar FA, Abro SAK, Sodhro GH, Pirbhulal S (2018) Evolution of 5G in internet of medical things. In: International conference on computing, mathematics and engineering technologies (iCoMET). IEEE, pp 1–7

  142. Lodro MM, Majeed N, Khuwaja AA, Sodhro AH, Greedy S (2018) Statistical channel modelling of 5G mmWave MIMO wireless communication. In: International conference on computing, mathematics and engineering technologies (iCoMET). IEEE, pp 1–5

  143. Teerapittayanon S, McDanel B, Kung H (2017) Distributed deep neural networks over the cloud, the edge and end devices. In: IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 328–339

  144. Michael Chui JM, Miremadi M, Henke N, Chung R, Nel P, Malhotra S (2018) Notes from the AI frontier: applications and value of deep learning. McKinsey & Company. https://www.mckinsey.com/featured-insights/artificial-intelligence/notes-from-the-ai-frontier-applications-and-value-of-deep-learning. Accessed 16 Oct 2019

  145. Press G (2019) Artificial intelligence (AI) stats news: AI augmentation to create $2.9 trillion of business value. Forbes. https://www.forbes.com/sites/gilpress/2019/08/12/artificial-intelligence-ai-stats-news-ai-augmentation-to-create-2-9-trillion-of-business-value/#21cb849b63c2. Accessed 16 Oct 2019

  146. MSV J (2019) Microsoft and Intel collaborate to simplify AI deployments at the edge. Forbes. https://www.forbes.com/sites/janakirammsv/2019/08/23/microsoft-and-intel-collaborate-to-simplify-ai-deployments-at-the-edge/#60ebb26f2a4b. Accessed 16 Oct 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abbas Z. Kouzani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Kouzani, A.Z. Implementation of DNNs on IoT devices. Neural Comput & Applic 32, 1327–1356 (2020). https://doi.org/10.1007/s00521-019-04550-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04550-w

Keywords

Navigation