Login  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine
Email
Print
Reprint
Learn RSS

Digital Libraries: Google, the Naked Emperor

By Roy Tennant -- Library Journal, 8/15/2005

Google rules. Wherever you turn you hear about a new Google initiative. Clearly, Google has the money to do some interesting things. But with all the hype and hullabaloo, it can be all too easy to overlook some serious flaws in Google's services.

As librarians, we should not be giving Google a "pass" that we would not afford other vendors. By being clear about Google's strengths and weaknesses, we can make effective decisions about when and how to use Google's services and advise our users appropriately.

Google Search

Google's flagship service is, of course, its web index. Google became nearly everyone's favorite search engine by crawling more of the web than anyone else and making it searchable through a dead-easy interface that responded with amazing alacrity. But it should be acknowledged that it is really only good at some very specific things and is completely ineffective for other purposes.

For example, sometimes I want to find brand-new web pages. But based on the PageRank algorithm (see Link List), these pages would naturally fall to the bottom of the search results. Does Google provide any method to reverse-sort the results, to view results based on date added, or to sort results based on the last change date of the page itself? No. So what are we left with? Trying to get to the "end" of the search results, wherever that may be.

The problem is that you can't even get to the end. As a Google spokesperson put it, "Google provides only the 1000 most relevant search results for a query, even when there are more than 1000 matches. (Due to variations in our estimates of results, we may occasionally display slightly fewer than 1000)." There is no option to go beyond that wall.

Google Scholar

The Scholar search service was announced at the end of last year to wide acclaim. What it attempts to do is to crawl (using the standard Google infrastructure) and index content from academic and scholarly publishers. Although Google has agreements with many publishers, it has no agreement with some significant ones, including Elsevier. Scholar's crawl of content, however, can lag months behind its appearance on the original site.

When users receive results, if the content is free, they can click through to it, but if it is not, they are taken to the publisher's web site, where they can often purchase access. Also, Google should be congratulated for working closely with libraries to enable OpenURL linking so that our clientele can click through to content under our licenses when they can be identified as valid users.

Scholar ranks the results based at least partly on the number of times an article was cited by another source. Given the lack of options on changing this display, for some disciplines this can be disastrous. For example, most scientific researchers are more interested in timely access to the latest content, and Scholar fails on both counts.

If you are in the humanities, Scholar doesn't fare much better. In a search on "hamlet," the results are swamped with scientific papers written by various persons named "Hamlet." Limiting the search word to the title of articles is better, but not much. What you get is a jumbled mess of scientific articles (e.g., HAMLET as an acronym for a substance or procedure), books, journal articles, and cryptic "citations" parsed from full-text articles.

Search results that are marked as "[CITATION]" have been extracted from the full text of crawled sources and therefore are often very incomplete. Many individual results are, in fact, almost indecipherable. To find out more, the user must either click the supplied link to do a "Web Search," which usually fails to find the article online, or click on the "Cited by" link to go to the source that cited it to find enough information to locate the article.

Scholar is, of course, in its early days, and it is quite possible that these problems will be addressed. But when considering whether Scholar is a sufficient replacement for commercial indexing services, we should use the very same criteria for evaluation, such as the "Database Quality Criteria" from SCOUG. At the moment, such a comparison leaves Scholar wanting in some very significant ways.

Keeping our heads

Collaboration with Google will likely provide some clear wins but also some significant trade-offs and even dire pitfalls. "It's important to remember," says Gary Price of ResourceShelf.com, "that Google is not in the information business in the same way as companies such as Factiva or Dialog are." Our clientele deserve no less than the same clear-eyed appraisal that we would use with any library vendor. It should not require an innocent child to detect when the emperor is without clothes.


Link List
Database Quality Criteria
bubl.ac.uk/archive/lis/org/ciqm/databa1.htm
Google Scholar
scholar.google.com
PageRank
en.wikipedia.org/wiki/PageRank
Google Search
google.com


Author Information
Roy Tennant (roy.tennant@ucop.edu) is User Services Architect, California Digital Library. He is author of Managing the Digital Library (Reed Business Pr., 2004)

Email
Print
Reprint
Learn RSS

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Sponsored Links



 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs

  • Michael Rogers
    LJ Insider

    May, 5 2008
    ALA Gets Its Game On
    Don’t know if you saw our News story about ALA launching a Gaming Pavillion at the summer Annu...
    More
  • Roy Tennant
    Tennant: Digital Libraries

    April, 30 2008
    A Hot Time in the Cow Town Tonight
    Recently I was asked to ride herd on three of my favorite speakers for the Texas Library Association...
    More
  • » VIEW ALL BLOGS RSS

Podcasts

Photos

Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
CRÍTICAS
Library DVD Guide
©2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites