Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine
Email
Learn RSS

Tennant: Digital Libraries   



Link This | Email this | Blog This | Comments (1)


The Unused Complexity of MARC

October 14, 2009 MAchine Readable Cataloging (MARC), a carrier standard for bibliographic information used along with rules described by the Anglo American Cataloging Rules, Second Edition (AACR2), has not (yet) died, despite my plea that it do so, nearly 7 years ago to this very day in the pages of Library Journal. Although that screed was not completely on the mark (do with that pun what you will), it helped to spark a conversation about our bibliographic standards and where we need to go in the future. For what it's worth, I corrected and expanded on the ideas in that less-than-800-word column in an award-winning journal article not long thereafter.

I was reminded about this in recent days by two independent and yet related events. One was the NISO Webinar on Bibliographic Control. Among other presenters, Bill Moen of the University of North Texas reported on research he had conducted along with his colleague Shawne Miksa and their research assistants, on use patterns of MARC fields and subfields -- some excellent research that will serve to inform our way forward into new standards for bibliographic description.

Another event was the release of data OCLC has extracted from the now 145 million record WorldCat database. As my OCLC Research colleague Thom Hickey reports, there have been several changes in the collective view of how we have used the MARC format over the years, probably mainly due to the massive influx of records from international sources. But one thing that remains true, and has now been reported by both Bill Moen and OCLC is that a number of fields and/or subfields of the MARC format are either completely unused or virtually so.

At this point I want to remind everyone that MARC was developed by well-meaning people who have worked hard to create and maintain this standard over many years in an attempt to forsee our needs into the future. They deserve our praise and thanks.

But needless complexity has made our systems needlessly complex, and therefore needlessly expensive and difficult. The issues begin with scripts to import data and don't end until we've figured out what to do with all 2,000 or so individual elements in a full record display. The costs are not insubstantial, and they can be multiplied across as many of our systems as come into contact with this complexity.

Now we must decide where we need to go, and how. I think a big part of what we must do is to decide what must remain and what can go. We must be highly selective in what we choose to carry forward into the future. The good news is that we don't need to imagine which data elements might be used or not -- we have actual data. Now begins the work to figure out what to do about it.

Posted by Roy Tennant on October 14, 2009 | Comments (1)


Email
Learn RSS


October 14, 2009
In response to: The Unused Complexity of MARC
LibraryThingTim commented:

"But needless complexity has made our systems needlessly complex, and therefore needlessly expensive and difficult."

When you say "systems," do you mean mostly back-end processing systems, or the catalog itself. On the catalog level, whatever the complexity, so many of these fields aren't shown at all—pure waste. Take the 008 boolean for Festschrift. I think it's kind of cool, but is there a single OPAC that displays it? Other times, a number of fields get at a question, like whether something is a DVD, without ever solving the question, and nothing is really displayable.

I question, however, the need to think parsimoniously. Libraries look down on Amazon and such for bad data, but the quality is improving and the complexity of the data now, I think, exceeds that of libraries. Anyway, data qua data doesn't make things expensive. WorldCat fits on an iPod these days. And if vendors says the quantity or complexity of MARC makes their systems expensive, I think they're either bad programmers or lying.

What costs is paying expensive people to enter the data, often multiple times and in a very tightly-defined workflow that prevents true collaborative work. There are other cures to that, like drawing data from underused sources (eg., ONIX—I gather OCLC does that now?) new types of sources, like web search engines, access and use statistics, Freebase, LibraryThing and so forth, and changing the workflow so that edits are easier and can be done in a more relaxed, distributed manner. Every time someone writes to AUTOCAT a request for someone with higher access to change a WorldCat record, something very wrong has happened, and its not about the complexity of the metadata. I gather OCLC recognizes this too, and is working to open up access.





POST A COMMENT
Display Name or Registered Users Login Here.
Please restrict submissions to less than 7,000 characters (including any HTML formatting).

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.

Advertisement

Advertisements





©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites