Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Google Books vs. BISON

Is the BISON catalog going the way of its namesake?

By Mark J. Ludwig & Margaret R. Wells -- Library Journal, 6/15/2008

Just as the Internet is likely to be one of the most disruptive overall technologies of our lifetimes, Google Books may become one of the most disruptive technologies for academic libraries. The immediate challenge is that Google Books' deeper indexing and more advanced relevancy ranking usually works better than that of our local catalogs—and it always returns results.

With library catalogs, bigger is better, and one all-inclusive silo is preferable to lots of little ones. Unfortunately, university library catalogs of fewer than five million books are bound to have much overlap with the 80–100 million books that will appear in Google Books in a few years. The total number of books scanned so far is a closely held secret, but Google has claimed to be scanning 3000 items every day. The University of Michigan alone scanned its millionth book for the project back in February 2008.

Google Books also has the newly published material relevant to actual student queries, materials that are not always immediately available in academic libraries or through interlibrary loan (ILL). The close connection between Google Books and WorldCat could generate additional ILL traffic, but patrons will go elsewhere as academic libraries struggle to collect quickly the new monographs users look for in their searches.

Here at the University Libraries of the University at Buffalo (UB), we were curious about whether our new BISON (Buffalo Information System ONline) catalog has actually improved service to our users, or whether competitors like Google Books have usurped the former functions of our catalog and local collections (for a full history, see “BISON's Evolution”). The very existence of an alternative service like Google Books raises additional questions about our local services. Should Google Books appear on library web sites, and when is it appropriate to direct users to its product?

As Google Books grows to contain the best collections of the best libraries, it is unclear where middle-tier, flat-budget academic libraries will be left. Do we become a delivery mechanism for books found through Google instead of a search provider in our own right? Certainly, Google Books' “find in a library” link into WorldCat enables a user to check our holdings and allows us to perform the delivery function. But as the online holdings of Google Books increases, will delivery request traffic dwindle? With this background in mind, we decided to put Google Books to the test.

One of the difficulties librarians find in dealing with Google and Google Books is that Google reveals very little about how it works. It protects its proprietary search algorithms and keeps its scanning projects confidential, so librarians don't know what has been completed or how large the content base has become. Improving search results almost becomes voodoo born of experience. As librarians, we must analyze Google Books in detail to understand it as well as any other large database we offer our users. Only then can we consider ways to incorporate it and other ebooks effectively into our services, collections, and web sites.

Since our study, Google has delivered an API (application programming interface) that allows local catalogs to link into specific Google Books. We are planning to test this capability and will probably implement it in the next academic year. While this enriches the content about individual items for the user, it doesn't improve the search results of our local catalog, with its indexing based exclusively on MARC cataloging data. Does raising the visibility of Google Books, as this API might, further drain users from the local catalog and collections?

Mimicking user searches

The idea behind our study was quite simple, although the execution required some tools and lots of heavy lifting. We examined one typical day of searching from the logs of BISON and got every search and the count of results. Then we ran this same set of searches against Google Books and compared them.

Alas, Google Books is designed to resist spammers and folks like us who would dissect and study it. The only viable way to run thousands of searches against it is to simulate single users banging away, one search at a time. It had to look like a human user or Google Books would stop responding. To accomplish this, we used Macro Express, a software utility that executes scripts and records a set of user movements on the PC. We found a way to cut and paste searches from a file into the Google Books search box and then cut and paste the Google responses into another file. We had to run it at half-speed owing to network slowness and to enable one particular step to complete consistently. Over the course of several weeks in November and December 2007, we were able to feed Google Books a couple of thousand searches and capture the statistical line from each result set indicating the number of hits. Using SAS (Statistical Analysis System), we then match-merged the Google Books hit counts with the original records from the BISON transaction logs. The numbers, discussed below, were astounding.

However, there were a few idiosyncrasies and anomalies that had to be taken into account. BISON requires the word “and” for the Boolean “and” operation, while Google Books uses the “+” sign. In a BISON keyword search, “and” is always a Boolean operator. In Google Books, it is a searchable keyword. Although Google Books assumes a Boolean “and” between words, a slightly larger number of hits results from using the “+”. (As an example, search “war and peace” and compare the results to “war + peace”.) This revealed a problem we had not expected: users were possibly getting zero results in our OPAC because they were using Google search syntax! This is easily reconfigured in Aleph. For purposes of match-merging search results, we changed all “and”s to “+” signs in our results data for statistical analysis. It was also necessary to sort and dedupe the incoming search logs from BISON. This usually clustered evolving searches so the user's thought process became more obvious.

Size matters

On the day we tested, BISON returned zero hits for 295 searches. This was out of a total of 1,596 “discovery” searches. (We excluded searches that were unlikely to be user-generated such as searches for ISBNs and OCLC numbers). The same searches all received results and apparently relevant results from Google Books. Those with spelling problems generated “did you mean” responses. In fact, on average, each “no hit” search in BISON returned 351 results in Google Books.

We invest so much effort getting students to use our resources; it is absolutely excruciating to know we are frequently sending them off with nothing, especially when they don't ask for help from librarians. With regard to “one hit” searches, these are mostly “known item” searches, i.e., checks for particular works. We went to some effort to remove these from the results we tested. Our library staff is working intensely at transferring monographs to storage and there are also many searches from ILL, so we think many of the known item searches are staff—rather than user—generated.

If we look at searches where our catalog probably has too little for the users, the results are also eye-opening. For searches yielding between two and ten results in BISON, Google Books returned up to 10,800 results! Google Books averaged 1,111 hits for these searches, which had fewer than ten results in our catalog. For searches returning a useful number of hits in BISON, i.e., 11–50 and 51–100, Google Books also returned more material, averaging 2,202 and 2,809, respectively. As the searches became too general for meaningful results, both systems returned high numbers of hits. In all cases, Google Books' relevancy ranking somehow brought interesting material to the top, while BISON's were rather random, without any obvious organization.

Differences big and small

In addition to the problem of users entering Google operators into BISON searches, the results revealed shortcomings related to the small number of words and phrases used to describe books in BISON. Although BISON includes some tables of contents, Google Books uses full-text indexing. Items may be full, partial, or searchable text, but the full indexing leads to chapters and citations that would never appear in a traditional catalog. As a result, Google Books still produced results for 100 percent of the searches, where BISON returned zero results for about 15–20 percent (with or without the known-item searches).

BISON has simple, advanced, and expert search modes, a journal search, and jumping off points for other catalogs. The interface was designed by librarians with academic clientele in mind, but it reflects library sensibilities. There is no single search box that will search each and every keyword of every record. In contrast, the Google Books interface, like most Google products, has a single box search with the option for an advanced search. Books are represented by published book covers or generated book covers if none are available. Users can select the book cover view or a list view similar to Google and most library OPACs.

With the results of our study and these differences in mind, we plan to implement a project similar to Google's “did you mean” tool to overcome spelling problems. We are also planning a regular monitoring program for “no hit” searches, with the goal of increasing overall results.

Playing catch-up

Some would respond that the BISON results, although fewer, were better than those from Google Books because they were more precise, and the chance of relevance to users was much greater because the retrieved results were based on subject headings. In actuality, many of the Google Books results were relevant and useful. Although users don't always see complete full text, the detail is usually sufficient for them to vet results and determine what is useful. And while users do need to watch out for the Google Books “doughnut hole,” i.e., the gap between scanned material out of copyright and new born-digital books fresh from publishers, materials in Google Books are far more visible and accessible than those in the local catalog and our collections. Google Books often allows a “search inside the book” and provides a cover image, table of contents, etc. For old canonical works, full text is generally available, often in multiple formats.

Tools like Google Books and other ebook platforms will soon be serious alternatives to idiosyncratic, local search interfaces. Ensuring that the latter are searchable, accessible, and visible is a role only libraries can fulfill. This study illustrates the need for libraries to be proactive in digitizing unique materials that contribute to scholarship. As it stands today at our university and many others, rich special collections have only begun to be digitized and put online. Unfortunately, this material never seems to come to the top in Google search results, and statistics show users are unlikely to find it scattered in multiple silos on our web sites. Libraries must think strategically and implement systems that are open and visible. They must be indexed by search engine crawlers, to get beyond our own little specialized indexes.

Value beyond access

We are excited about the continuing opportunities for librarians to use tools like Google Books to serve users better and contribute to scholarship, but our limited study raised more questions than it answered. Similar studies should be made of other resources to see how local users prefer products like Google Books and Google Scholar to our subscription-based e-resources. Is the average undergraduate student better off beginning his/her research with Google Books? If Google Books is scanning old materials and also getting new content from publishers, this leaves relatively little for small to medium-sized academic libraries to contribute. Libraries will need to find a way to add value beyond access and delivery once millions of items from research collections are added to Google Books. It seems certain that libraries' e-resources budget allocations will continue to change as more and more monographs become available online. But what will happen to the library's role in preservation, cataloging, and circulation? Will Google and Google Books lead to the extinction of academic research collections as we know them?

Our study also points out the necessity for librarians to investigate aggressively and stay abreast of disruptive technology and build it into new services wherever possible. Libraries and librarians must constantly be attuned to patrons' behavior; we need to consider how we can use our unique qualities and collections to everyone's advantage. The bar has been raised. The maturing Internet and evolving array of Web 2.0 services has turned our customer base into what many have called a “Google Generation.” We can debate that moniker, but, clearly, no one is calling this the “Academic Library Generation.” Our BISON catalog may not be extinct, but it is being hunted down by the competition. As in nature, libraries had best adapt, change quickly, and build on past successes.


Author Information
Mark J. Ludwig is Systems Manager and Margaret R. Wells is Director, Public Services, University Libraries of the State University of New York at Buffalo

 

BISON's Evolution

The University at Buffalo (UB) Libraries' new BISON (Buffalo Information System ONline) Catalog was released to the public in summer 2006. The debut of the OPAC followed years of customization and planning by librarians and computing professionals. Unlike the original BISON introduced in 1990, the new BISON came into a world transformed by the Internet: an expanding community of users who never enter library buildings, a library collection influenced by declining book budgets, a wide range of full-text databases, and an environment where users determine their own search choices, often without any intervention from librarians. The printed book as the center of the library was gone, and users were increasingly choosing options beyond the local catalog for their search needs, expecting to find everything online.

BISON in the wild

The release of the new BISON was part of the long-awaited switch from our old mainframe-based system to the client-server-based Ex Libris Aleph Library Management System. Although UB Libraries were closely involved in the State University of New York's (SUNY) decision to purchase and move centrally all SUNY library catalogs to Aleph (including the UB implementation of SUNY's first Aleph catalog for SUNY-Fredonia), our local implementation was delayed for years by data conversion and indexing scalability issues. By 2005, Ex Libris had significantly refined its ILS with the releases of Aleph 16 and 17; we implemented Aleph 17 and avoided the conversion issues associated with the move from 14 to 16. We mention all of this to show what a relief it was for us to be finally successful in the Aleph switch over. In summer 2006, we were finally “off the mainframe,” after a journey that had begun in 1997.

The release of the new BISON to the public was deliberately kept low-key in deference to the completion of another long-term project, the new offsite, high-density library storage facility. This project was also 20 years in the making and was designed to free up much valuable floor space that could be repurposed for library and nonlibrary uses. The original OPAC replaced the physical card catalogs. The new OPAC, storage facility, and electronic resources are gradually replacing the physical stacks. While users appreciate some of the new features of BISON, many notice very little about the change because of the competition from our other databases, with their relatively dynamic interfaces.

Related Content

Related Content

 

By This Author

There are no other articles written by this author.

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
LJ Criticas Review Alert
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites