Skip to main content

Challenges in Baseline Detection of Arabic Script Based Languages

  • Chapter
Intelligent Systems for Science and Information

Part of the book series: Studies in Computational Intelligence ((SCI,volume 542))

Abstract

In this chapter, we present baseline detection challenges for Arabic script based languages and targeted Nastaliq and Naskh writing style. Baseline is an important step in the OCR as it directly affects the rest of the steps and increases the performance and efficiency of character segmentation and feature extraction in OCR process. Character recognition on Arabic script is relatively more difficult than Latin text due to the nature of Arabic script, which is cursive, context sensitive and different writing style. In this paper, we provide a comprehensive review of baseline detection methods for Urdu language. The aim of the chapter is to introduce the challenges during baseline detection in cursive script languages for Nastaliq and Naskh script.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Razzak, M.I., Mirza, A.A., et al.: Effect of ghost character theory on Arabic script based languages character recognition. Przeglad Elektrotechniczny, ISSN 0033-2097

    Google Scholar 

  2. Raza, A., Siddiqi, I., Abidi, A., Arif, F.: An unconstrained benchmark Urdu handwritten sentence database with automatic line segmentation. In: International Conference on Frontiers in Handwriting Recognition (2012)

    Google Scholar 

  3. Farooq, F., Govindaraju, V., Perrone, M.: Pre-processing methods for hand-written Arabic documents. In: Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 267–271. IEEE (2005)

    Google Scholar 

  4. Al-Rashaideh, H.: Preprocessing phase for Arabic word handwritten recognition. Russian Academy of Sciences 6(1), 11–19 (2006)

    Google Scholar 

  5. Parhami, B., Taraghi, M.: Automatic recognition of printed farsi texts. Pattern Recognition 14, 395–403 (1981)

    Article  Google Scholar 

  6. Boubaker, H., Kherallah, M., Alimi, A.M.: New algorithm of straight or curved baseline detection for short arabic handwritten writing. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 778–782. IEEE (2009)

    Google Scholar 

  7. Natarajan, P., Belanger, D., Prasad, R., Kamali, M., Subramanian, K.: Baseline Dependent Percentile Features for Oine Arabic Handwriting Recognition. In: International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 329–333. IEEE (2011)

    Google Scholar 

  8. Al-Badr, B., Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Processing 41(1), 49–77 (1995)

    Article  MATH  Google Scholar 

  9. Amin, A.: Online arabic character recognition: the state of the art. Pattern Recognition 31(5), 517–530 (1998)

    Article  MathSciNet  Google Scholar 

  10. Shah, Z.A.: Ligature based optical character recognition of urdu-nastaleeq font. In: International Multi Topic Abstracts Conference, INMIC 2002, 25 p. IEEE (2002)

    Google Scholar 

  11. Sabbour, N., Shafait, F.: A segmentation-free approach to arabic and urdu ocr. In: IS&T/SPIE Electronic Imaging, pp. 86580–86580. International Society for Optics and Photonics (2013)

    Google Scholar 

  12. Pechwitz, M., Margner, V.: Baseline estimation for arabic handwritten words. In: Proceedings of the Electrochemical Society of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR) Frontiers in Handwriting Recognition (IWFHR), 479 p. (2002)

    Google Scholar 

  13. Nagabhushan, P., Alaei, A.: Tracing and straightening the baseline in hand-written persian/arabic text-line: A new approach based on painting-technique. The Proceeding of Int. Journal on Computer Science and Engineering, 907–916 (2010)

    Google Scholar 

  14. Abu-Ain, T., Sheikh Abdullah, S.N.H., Bataineh, B., Omar, K., Abu-Ein, A.: A novel baseline detection method of handwritten Arabic-script documents based on sub-words. In: Noah, S.A., Abdullah, A., Arshad, H., Abu Bakar, A., Othman, Z.A., Sahran, S., Omar, N., Othman, Z. (eds.) M-CAIT 2013. CCIS, vol. 378, pp. 67–77. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. AL-Shatnawi, A., Omar, K.: A comparative study between methods of Arabic baseline detection. In: International Conference on Electrical Engineering and Informatics, ICEEI 2009, vol. 1, pp. 73–77. IEEE (2009)

    Google Scholar 

  16. Li, Q., Xie, Y.: Randomised hough transform with error propagation for line and circle detection. Pattern Analysis & Applications 6(1), 55–64 (2003)

    Article  MATH  Google Scholar 

  17. Yamani, M., Idris, I., Razak, Z., Zulkiee, K.: Online handwriting text line segmentation: A review. IJCSNS International Journal of Computer Science and Network Security 8(7) (2008)

    Google Scholar 

  18. Likforman-Sulem, L., Hanimyan, A., Faure, C.: A hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 2, pp. 774–777. IEEE (1995)

    Google Scholar 

  19. Maddouri, S.S., Samoud, F.B., Bouriel, K., Ellouze, N., El Abed, H.: Baseline extraction: Comparison of six methods on ifn/enit database. In: The 11th International Conference on Frontiers in Handwriting Recognition (2008)

    Google Scholar 

  20. Burrow, P.: Arabic handwriting recognition. m.sc. thesis. Master’s thesis, University of Edinburgh. England (2004)

    Google Scholar 

  21. Al-Shatnawi, A.M., Omar, K.: Methods of arabic language baseline detection, the state of art. ARISER 4, 185–193 (2008)

    Google Scholar 

  22. Pal, U., Sarkar, A.: Recognition of printed urdu script. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003 (2003)

    Google Scholar 

  23. Ahmad, Z., Orakzai, J.K., Shamsher, I.: Urdu compound character recogni-tion using feed forward neural networks. In: 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, pp. 457–462. IEEE (2009)

    Google Scholar 

  24. Sattar, S.A., Haque, S., Pathan, M.K.: Nastaliq optical character recognition. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, pp. 329–331. ACM (2008)

    Google Scholar 

  25. http://en.wikipedia.org/wiki/Nastaliq_script

  26. Javed, S.T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., Moin, H.: Segmentation free nastalique urdu ocr. In: Word Academy of Science, Engineering and Technology (2010)

    Google Scholar 

  27. Razzak, M.I., Sher, M., Hussain, S.A.: Locally baseline detection for online Arabic script based languages character recognition. International Journal of the Physical Sciences 5(7), 955–959 (2010)

    Google Scholar 

  28. Wali, A., Gulzar, A., Zia, A., Ghazali, M.A., Rafiq, M.I., Niaz, M.S., Hussain, S., Bashir, S.: contextual shape analysis of Nastaliq

    Google Scholar 

  29. Razzak, M.I., Hussain, S.A., Sher, M., Khan, Z.S.: Combining offline and online preprocessing for online urdu character recognition. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 18–20 (2009)

    Google Scholar 

  30. Razzak, M.I., Anwar, F., Husain, S.A., Belaid, A., Sher, M.: Hmm and fuzzy logic: A hybrid approach for online urdu script-based languages character recognition. Knowledge-Based Systems 23(8), 914–923 (2010)

    Article  Google Scholar 

  31. Razzak, M.I., Husain, S.A., Mirza, A.A., Belad, A.: Fuzzy based preprocessing using fusion of online and oine trait for online urdu script based languages char-acter recognition. International Journal of Innovative Computing, Information and Control 8, 1349–4198 (2012)

    Google Scholar 

  32. Razzak, M.I., Husain, S.A., Mirza, A.A., Khan, M.K.: Bio-inspired multilayered and multilanguage Arabic script character recognition system. International Journal of Innovative Computing, Information and Control 8 (2012)

    Google Scholar 

  33. Razzak, M.I.: Online Urdu Character Recognitio. In: Unconstrained Environment. PhD thesis, International Islamic University, Islamabad (2011)

    Google Scholar 

  34. Sardar, S., Wahab, A.: Optical character recognition system for Urdu. In: 2010 International Conference on Information and Emerging Technologies (ICIET), pp. 1–5. IEEE (2010)

    Google Scholar 

  35. Javed, S.T., Hussain, S.: Improving Nastalique specific pre-recognition process for Urdu OCR. In: IEEE 13th International Multitopic Conference (INMIC 2009), pp. 1–6 (2009)

    Google Scholar 

  36. Shafait, F., Keysers, D., Breuel, T.M., et al.: Layout analysis of Urdu document images. In: Multitopic Conference, INMIC 2006, pp. 293–298. IEEE (2006)

    Google Scholar 

  37. Breuel, T.M.: High performance document layout analysis. In: Proceedings of the Symposium on Document Image Understanding Technology, pp. 209–218 (2003)

    Google Scholar 

  38. Breuel, T.M.: Two geometric algorithms for layout analysis. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 188–199. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  39. Sattar, S.A., Shah, S.: Character Recognition of Arabic Script Languages. In: ICCIT 2012 (2012)

    Google Scholar 

  40. Naz, S., Hayat, K., Anwar, M.W., Akbar, H., Razzak, M.I.: Challenges in Baseline Detection of Cursive Script Languages. In: Science and Information Conference 2013, London, UK, October 7-9 (2013)

    Google Scholar 

  41. Mukhtar, O., Setlur, S., Govindaraju, V.: Experiments on urdu text recognition. In: Guide to OCR for Indic Scripts, pp. 163–171 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeeda Naz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Naz, S., Razzak, M.I., Hayat, K., Anwar, M.W., Khan, S.Z. (2014). Challenges in Baseline Detection of Arabic Script Based Languages. In: Chen, L., Kapoor, S., Bhatia, R. (eds) Intelligent Systems for Science and Information. Studies in Computational Intelligence, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-319-04702-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04702-7_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04701-0

  • Online ISBN: 978-3-319-04702-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics