Skip to main content

Leveraging the Legacy of Conventional Libraries for Organizing Digital Libraries

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5714))

Abstract

With the significant growth in the number of available electronic documents on the Internet, intranets, and digital libraries, the need for developing effective methods and systems to index and organize E-documents is felt more than ever. In this paper we introduce a new method for automatic text classification for categorizing E-documents by utilizing classification metadata of books, journals and other library holdings, that already exists in online catalogues of libraries. The method is based on identifying all references cited in a given document and, using the classification metadata of these references as catalogued in a physical library, devising an appropriate class for the document itself according to a standard library classification scheme with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype classification system for classifying electronic syllabus documents archived in the Irish National Syllabus Repository according to the well-known Dewey Decimal Classification (DDC) scheme.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Avancini, H., Rauber, A., Sebastiani, F.: Organizing Digital Libraries by Automated Text Categorization. In: International Conference on Digital Libraries, ICDL 2004, New Delhi, India (2004)

    Google Scholar 

  2. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  3. Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendations. New Review of Hypermedia and Multimedia 12(1), 11–27 (2006)

    Article  Google Scholar 

  4. Yi, K.: Automated Text Classification Using Library Classification Schemes: Trends, Issues, and Challenges. In: International Cataloguing and Bibliographic Control (ICBC), vol. 36(4) (2007)

    Google Scholar 

  5. Dewey, M.: Dewey Decimal Classification (DDC) OCLC Online Computer Library Center (1876), http://www.oclc.org/us/en/dewey (cited January 2008)

  6. Putnam, H.: Library of Congress Classification (LCC) Library of Congress, Cataloging Policy and Support Office (1897), http://www.loc.gov/catdir/cpso/lcc.html (cited January 2008)

  7. Scorpion, OCLC Online Computer Library Center, Inc. (2002), http://www.oclc.org/research/software/scorpion/default.htm

  8. Larson, R.R.: Experiments in automatic Library of Congress Classification. Journal of the American Society for Information Science 43(2), 130–148 (1992)

    Article  Google Scholar 

  9. Jenkins, C., Jackson, M., Burden, P., Wallis, J.: Automatic classification of Web resources using Java and Dewey Decimal Classification. Computer Networks and ISDN Systems 30(1-7), 646–648 (1998)

    Article  Google Scholar 

  10. Dolin, R., Agrawal, D., Abbadi, E.E.: Scalable collection summarization and selection. In: Proceedings of the fourth ACM conference on Digital libraries, Berkeley, California, United States (1999)

    Google Scholar 

  11. Chung, Y.M., Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents. Journal of Information Science 29(2), 117–126 (2003)

    Article  Google Scholar 

  12. Pong, J.Y.-H., Kwok, R.C.-W., Lau, R.Y.-K., Hao, J.-X., Wong, P.C.-C.: A comparative study of two automatic document classification methods in a library setting. Journal of Information Science 34(2), 213–230 (2008)

    Article  Google Scholar 

  13. Frank, E., Paynter, G.W.: Predicting Library of Congress classifications from Library of Congress subject headings. Journal of the American Society for Information Science and Technology 55(3), 214–227 (2004)

    Article  Google Scholar 

  14. Joorabchi, A., Mahdi, A.E.: A New Method for Bootstrapping an Automatic Text Classification System Utilizing Public Library Resources. In: Proceedings of the 19th Irish Conference on Artificial Intelligence and Cognitive Science, Cork, Ireland (August 2008)

    Google Scholar 

  15. Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective Classification in Network Data. Technical Report CS-TR-4905, University of Maryland, College Park (2008), http://hdl.handle.net/1903/7546

  16. Joorabchi, A., Mahdi, A.E.: Development of a national syllabus repository for higher education in ireland. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 197–208. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. OpenOffice.org 2.0, sponsored by Sun Microsystems Inc., released under the open source LGPL licence (2007), http://www.openoffice.org/

  18. Xpdf 3.02, Glyph & Cog, LLC., Released under the open source GPL licence (2007), http://www.foolabs.com/xpdf/

  19. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, US (July 2002)

    Google Scholar 

  20. Z39.50, International Standard Maintenance Agency - Library of Congress Network Development and MARC Standards Office, 2.0 (1992), http://www.loc.gov/z3950/agency/

  21. MARC standards. Library of Congress Network Development and MARC Standards Office (1999), http://www.loc.gov/marc/

  22. ISCED. International Standard Classification of Education -1997 version (ISCED 1997) (UNESCO (1997), http://www.uis.unesco.org (cited July 2008)

  23. WorldCat (Online Computer Library Center (OCLC) (2001)(2008), http://www.oclc.org/worldcat/default.htm (cited January 2008)

  24. Councill, I.G., Giles, C.L., Kan, M.-Y.: ParsCit: An open-source CRF reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morrocco (May 2008)

    Google Scholar 

  25. Traugott, K., Anders, A., Koraljka, G.: Browsing and searching behavior in the renardus web service a study based on log analysis. In: Proceedings of the Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, Tuscon, AZ, USA. ACM Press, New York (2004)

    Google Scholar 

  26. Giles, C.L., Kurt, D.B., Steve, L.: CiteSeer: an automatic citation indexing system. In: Proceedings of the third ACM conference on Digital libraries, Pittsburgh, USA (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joorabchi, A., Mahdi, A.E. (2009). Leveraging the Legacy of Conventional Libraries for Organizing Digital Libraries. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04346-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04345-1

  • Online ISBN: 978-3-642-04346-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics