Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Digital Libraries: The Murky Bucket Syndrome

By Roy Tennant -- Library Journal, 12/15/2004

Two recent, unrelated events put into stark focus the major challenges we have ahead of us if we want to serve our users as they expect and deserve.

The first event arose when I innocently reported that two records for the same journal in our union catalog failed to merge. These were not old, battle-scarred records that had survived numerous title changes or the other disasters that plague serial records. These were records created for Respiratory Research, founded in 2000 by BioMed Central. As it turned out, these records represent an issue that is going to hound us.

One per format

The problem is this: the journal has both a print and an online version. One of our libraries cataloged it as an electronic journal (creating separate records for different versions), and the others cataloged it with one joint print and online record. Our merging algorithm failed since the print and e-forms have different ISSNs.

Both methods of cataloging have proponents and opponents, benefits and drawbacks. In any case, as I was informed, it is a problem seemingly without solution. Even within one university we cannot agree about how to handle issues like this consistently.

But the main points are: 1) our cataloging infrastructure is rife with these problems; 2) such problems have unfortunate consequences for users; and 3) our existing infrastructure appears to be inadequate to solve this problem and others like it. Certainly this is not unique to my institution. Unfortunately, it is systemic.

New needs for old data

The other incident that helped frame this problem was an email exchange with Lorcan Dempsey, vice president for research, OCLC, Inc. He responded to my column "The Trouble with Online" (LJ 9/15/04, p. 26), which chronicled the inability of library users to limit their catalog search to only online materials (another indication that our systems are hopelessly inadequate).

Referring to this difficulty, Dempsey described the " 'murky bucket syndrome' that affects any large bibliographic database—we cannot entirely, unambiguously slice and dice the database because of historic data entry and cataloging practices that…were not oriented toward our new needs."

In going forward, Dempsey believes we "need to think about not just sharing data but extracting as much value as we can from it through processing." A prime example of extracting value from existing data is OCLC's FictionFinder service, which mines the MARC record for fiction genre information and uses it to provide a readers' advisory service.

But as I was soon to discover with the Respiratory Researchexample, the effects of the murky bucket syndrome are not limited to the "slicing" problem. As Dempsey put it, "this issue is now cropping up all over the place. As we try to do things programmatically, the structure and content practices really matter in ways they might not have before (FRBRization, data mining, etc.)…. Increasingly, I think we need to look at cataloging practices in light of the new world of programmatic uses." For more from Dempsey, visit his web log, available through the online version of this article.

Some signs of hope

Recent research points out that the situation may not be quite as daunting as we think. In a paper at a recent Dublin Core conference ("Assessing Metadata Utilization") Bill Moen at the University of North Texas School of Library and Information Sciences revealed that only a small number of elements accounts for the vast majority of occurrences in a test dataset of 400,000 WorldCat MARC records, and fewer than half of the nearly 2000 fields/ subfields currently defined in MARC 21 occurred in even one of the records in the test set. In other words, despite the size and complexity of the MARC record, much of it is little used.

Also, recent experiments with merging record displays based on principles outlined in the Functional Requirements for Bibliographic Records (FRBR, which leads to the verb "FRBRization") may point a way out of at least some of the mess. To see a system that has applied the FRBR principles, check out the Research Libraries Group's redlightgreen.com project.

Metadata for tomorrow

We need more large-scale experiments with existing catalog records to see what can be done with legacy data. But we must also think about how to reengineer our infrastructure to enable robust machine processing, support for multiple record formats, and flexibility in user interfaces and screen display. For more on where we need to go, see my article "A Bibliographic Metadata Infrastructure for the Twenty-First Century."

I've been hitting on metadata issues hard in this column, especially in recent months. I am increasingly disturbed by our inability to get this right, at least given today's needs. The library profession seems fond of assuming that its bibliographic infrastructure is the best ever devised, worthy of respect and admiration. There is some truth to that but also some self-delusion. If this is the best bibliographic infrastructure ever devised, then we (and, more importantly, our users) are in trouble. We must fix it, and soon.

Links List
Assessing Metadata Utilization: An Analysis of MARC Content Designation Use
www.unt.edu/wmoen/publications/MARCPaper_Final2003.pdf
A Bibliographic Metadata Infrastructure for the Twenty-First Century
roytennant.com/metadata.pdf
Examining Present Practices to Inform Future Metadata Use: An Empirical Analysis of MARC Content Designation Utilization
www.unt.edu/mcdu
FictionFinder
fictionfinder.oclc.org
Functional Requirements of Bibliographic Records
www.ifla.org/VII/s13/frbr/frbr.pdf
Lorcan Dempsey's Web Log
orweblog.oclc.org
RedLightGreen
redlightgreen.com


Author Information
Roy Tennant (roy.tennant@ucop.edu) is User Services Architect, California Digital Library. He is author of Managing the Digital Library (Reed Business Pr., 2004)

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
CRÍTICAS
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites