Challenges in Baseline Detection of Arabic Script Based Languages

Naz, Saeeda; Razzak, Muhammad Imran; Hayat, Khizar; Anwar, Muhammad Waqas; Khan, Sahib Zar

doi:10.1007/978-3-319-04702-7_11

Saeeda Naz^5,6,
Muhammad Imran Razzak^5,6,
Khizar Hayat^5,6,
Muhammad Waqas Anwar^5,6 &
…
Sahib Zar Khan^5,6

Part of the book series: Studies in Computational Intelligence ((SCI,volume 542))

1248 Accesses
4 Citations
7 Altmetric

Abstract

In this chapter, we present baseline detection challenges for Arabic script based languages and targeted Nastaliq and Naskh writing style. Baseline is an important step in the OCR as it directly affects the rest of the steps and increases the performance and efficiency of character segmentation and feature extraction in OCR process. Character recognition on Arabic script is relatively more difficult than Latin text due to the nature of Arabic script, which is cursive, context sensitive and different writing style. In this paper, we provide a comprehensive review of baseline detection methods for Urdu language. The aim of the chapter is to introduce the challenges during baseline detection in cursive script languages for Nastaliq and Naskh script.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Razzak, M.I., Mirza, A.A., et al.: Effect of ghost character theory on Arabic script based languages character recognition. Przeglad Elektrotechniczny, ISSN 0033-2097
Google Scholar
Raza, A., Siddiqi, I., Abidi, A., Arif, F.: An unconstrained benchmark Urdu handwritten sentence database with automatic line segmentation. In: International Conference on Frontiers in Handwriting Recognition (2012)
Google Scholar
Farooq, F., Govindaraju, V., Perrone, M.: Pre-processing methods for hand-written Arabic documents. In: Proceedings of the 2005 Eight International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 267–271. IEEE (2005)
Google Scholar
Al-Rashaideh, H.: Preprocessing phase for Arabic word handwritten recognition. Russian Academy of Sciences 6(1), 11–19 (2006)
Google Scholar
Parhami, B., Taraghi, M.: Automatic recognition of printed farsi texts. Pattern Recognition 14, 395–403 (1981)
Article Google Scholar
Boubaker, H., Kherallah, M., Alimi, A.M.: New algorithm of straight or curved baseline detection for short arabic handwritten writing. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 778–782. IEEE (2009)
Google Scholar
Natarajan, P., Belanger, D., Prasad, R., Kamali, M., Subramanian, K.: Baseline Dependent Percentile Features for Oine Arabic Handwriting Recognition. In: International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 329–333. IEEE (2011)
Google Scholar
Al-Badr, B., Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Processing 41(1), 49–77 (1995)
Article MATH Google Scholar
Amin, A.: Online arabic character recognition: the state of the art. Pattern Recognition 31(5), 517–530 (1998)
Article MathSciNet Google Scholar
Shah, Z.A.: Ligature based optical character recognition of urdu-nastaleeq font. In: International Multi Topic Abstracts Conference, INMIC 2002, 25 p. IEEE (2002)
Google Scholar
Sabbour, N., Shafait, F.: A segmentation-free approach to arabic and urdu ocr. In: IS&T/SPIE Electronic Imaging, pp. 86580–86580. International Society for Optics and Photonics (2013)
Google Scholar
Pechwitz, M., Margner, V.: Baseline estimation for arabic handwritten words. In: Proceedings of the Electrochemical Society of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR) Frontiers in Handwriting Recognition (IWFHR), 479 p. (2002)
Google Scholar
Nagabhushan, P., Alaei, A.: Tracing and straightening the baseline in hand-written persian/arabic text-line: A new approach based on painting-technique. The Proceeding of Int. Journal on Computer Science and Engineering, 907–916 (2010)
Google Scholar
Abu-Ain, T., Sheikh Abdullah, S.N.H., Bataineh, B., Omar, K., Abu-Ein, A.: A novel baseline detection method of handwritten Arabic-script documents based on sub-words. In: Noah, S.A., Abdullah, A., Arshad, H., Abu Bakar, A., Othman, Z.A., Sahran, S., Omar, N., Othman, Z. (eds.) M-CAIT 2013. CCIS, vol. 378, pp. 67–77. Springer, Heidelberg (2013)
Chapter Google Scholar
AL-Shatnawi, A., Omar, K.: A comparative study between methods of Arabic baseline detection. In: International Conference on Electrical Engineering and Informatics, ICEEI 2009, vol. 1, pp. 73–77. IEEE (2009)
Google Scholar
Li, Q., Xie, Y.: Randomised hough transform with error propagation for line and circle detection. Pattern Analysis & Applications 6(1), 55–64 (2003)
Article MATH Google Scholar
Yamani, M., Idris, I., Razak, Z., Zulkiee, K.: Online handwriting text line segmentation: A review. IJCSNS International Journal of Computer Science and Network Security 8(7) (2008)
Google Scholar
Likforman-Sulem, L., Hanimyan, A., Faure, C.: A hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 2, pp. 774–777. IEEE (1995)
Google Scholar
Maddouri, S.S., Samoud, F.B., Bouriel, K., Ellouze, N., El Abed, H.: Baseline extraction: Comparison of six methods on ifn/enit database. In: The 11th International Conference on Frontiers in Handwriting Recognition (2008)
Google Scholar
Burrow, P.: Arabic handwriting recognition. m.sc. thesis. Master’s thesis, University of Edinburgh. England (2004)
Google Scholar
Al-Shatnawi, A.M., Omar, K.: Methods of arabic language baseline detection, the state of art. ARISER 4, 185–193 (2008)
Google Scholar
Pal, U., Sarkar, A.: Recognition of printed urdu script. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003 (2003)
Google Scholar
Ahmad, Z., Orakzai, J.K., Shamsher, I.: Urdu compound character recogni-tion using feed forward neural networks. In: 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, pp. 457–462. IEEE (2009)
Google Scholar
Sattar, S.A., Haque, S., Pathan, M.K.: Nastaliq optical character recognition. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, pp. 329–331. ACM (2008)
Google Scholar
http://en.wikipedia.org/wiki/Nastaliq_script
Javed, S.T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., Moin, H.: Segmentation free nastalique urdu ocr. In: Word Academy of Science, Engineering and Technology (2010)
Google Scholar
Razzak, M.I., Sher, M., Hussain, S.A.: Locally baseline detection for online Arabic script based languages character recognition. International Journal of the Physical Sciences 5(7), 955–959 (2010)
Google Scholar
Wali, A., Gulzar, A., Zia, A., Ghazali, M.A., Rafiq, M.I., Niaz, M.S., Hussain, S., Bashir, S.: contextual shape analysis of Nastaliq
Google Scholar
Razzak, M.I., Hussain, S.A., Sher, M., Khan, Z.S.: Combining offline and online preprocessing for online urdu character recognition. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 18–20 (2009)
Google Scholar
Razzak, M.I., Anwar, F., Husain, S.A., Belaid, A., Sher, M.: Hmm and fuzzy logic: A hybrid approach for online urdu script-based languages character recognition. Knowledge-Based Systems 23(8), 914–923 (2010)
Article Google Scholar
Razzak, M.I., Husain, S.A., Mirza, A.A., Belad, A.: Fuzzy based preprocessing using fusion of online and oine trait for online urdu script based languages char-acter recognition. International Journal of Innovative Computing, Information and Control 8, 1349–4198 (2012)
Google Scholar
Razzak, M.I., Husain, S.A., Mirza, A.A., Khan, M.K.: Bio-inspired multilayered and multilanguage Arabic script character recognition system. International Journal of Innovative Computing, Information and Control 8 (2012)
Google Scholar
Razzak, M.I.: Online Urdu Character Recognitio. In: Unconstrained Environment. PhD thesis, International Islamic University, Islamabad (2011)
Google Scholar
Sardar, S., Wahab, A.: Optical character recognition system for Urdu. In: 2010 International Conference on Information and Emerging Technologies (ICIET), pp. 1–5. IEEE (2010)
Google Scholar
Javed, S.T., Hussain, S.: Improving Nastalique specific pre-recognition process for Urdu OCR. In: IEEE 13th International Multitopic Conference (INMIC 2009), pp. 1–6 (2009)
Google Scholar
Shafait, F., Keysers, D., Breuel, T.M., et al.: Layout analysis of Urdu document images. In: Multitopic Conference, INMIC 2006, pp. 293–298. IEEE (2006)
Google Scholar
Breuel, T.M.: High performance document layout analysis. In: Proceedings of the Symposium on Document Image Understanding Technology, pp. 209–218 (2003)
Google Scholar
Breuel, T.M.: Two geometric algorithms for layout analysis. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 188–199. Springer, Heidelberg (2002)
Chapter Google Scholar
Sattar, S.A., Shah, S.: Character Recognition of Arabic Script Languages. In: ICCIT 2012 (2012)
Google Scholar
Naz, S., Hayat, K., Anwar, M.W., Akbar, H., Razzak, M.I.: Challenges in Baseline Detection of Cursive Script Languages. In: Science and Information Conference 2013, London, UK, October 7-9 (2013)
Google Scholar
Mukhtar, O., Setlur, S., Govindaraju, V.: Experiments on urdu text recognition. In: Guide to OCR for Indic Scripts, pp. 163–171 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

COMSATS Institute of Information Technology, Abbottabad, Pakistan
Saeeda Naz, Muhammad Imran Razzak, Khizar Hayat, Muhammad Waqas Anwar & Sahib Zar Khan
King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Saeeda Naz, Muhammad Imran Razzak, Khizar Hayat, Muhammad Waqas Anwar & Sahib Zar Khan

Authors

Saeeda Naz
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Imran Razzak
View author publications
You can also search for this author in PubMed Google Scholar
Khizar Hayat
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Waqas Anwar
View author publications
You can also search for this author in PubMed Google Scholar
Sahib Zar Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeeda Naz .

Editor information

Editors and Affiliations

School of Computer Science and Informatics, De Montfort University, The Gateway, Leicester, LE1 9BH, United Kingdom
Liming Chen
The Science and Information Organization, New York, USA
Supriya Kapoor
The Science and Information Organization, New York, USA
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Naz, S., Razzak, M.I., Hayat, K., Anwar, M.W., Khan, S.Z. (2014). Challenges in Baseline Detection of Arabic Script Based Languages. In: Chen, L., Kapoor, S., Bhatia, R. (eds) Intelligent Systems for Science and Information. Studies in Computational Intelligence, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-319-04702-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-04702-7_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04701-0
Online ISBN: 978-3-319-04702-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics