Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Digital Libraries- The Importance of Being Granular

By Roy Tennant -- Library Journal, 5/15/2002

Our libraries are increasingly dependent on metadata. Besides the obvious (our catalogs), other uses are becoming more commonplace. Virtually any content we digitize and make available to our clientele requires metadata for discovery and access. Every interlibrary loan transaction is a slug of metadata that helps libraries get a book or journal article to a user. Libraries now license so many databases and collections of online content that they increasingly offer a way for users to search for a resource based on their topic. Such a service requires metadata.

Last month, in "Metadata as if Libraries Depended on It" (LJ 4/15/02, p. 32ff.), I discussed metadata and its various components: a standard container, qualification, usage guidelines, and the information being captured. In that overview I set aside one topic as being worthy of its own column: metadata granularity.

How you chop it

Granularity refers to how finely you chop your metadata. For example, in the standard for encoding the full text of books using the Text Encoding Initiative (TEI) schema, a book author may be recorded as: <docAuthor) William Shakespeare(/docAuthor>. That's all well and good, if you never need to know which string of text comprises the author's last name and which the first. If you do and most library catalogs should have this capability, you're not going to get very far with information extracted from a book encoded using the TEI tag set.

Although TEI has been around in one form or another for 15 years, its focus is mainly on the recording of aspects of a work for humanities scholars. As such, it is not particularly well suited for library-style bibliographic description. Nonetheless, as more texts are digitized in their entirety, libraries will increasingly be using either some form of TEI or another similar schema (e.g., ISO 12083). Therefore, it behooves us to know how well or how poorly standards such as TEI and MARC can interoperate.

How granularity helps

Granularity is good. It makes it possible to distinguish one bit of metadata from another and can lead to all kinds of additional user services. For example, you can't sort records on author names if you can't tell the last name. Wait a minute, you're thinking, we do it all the time since the MARC record is sufficiently granular. And you would be correct.

Generally speaking, most of the information in a MARC record is sufficiently granular for the purposes for which it was designed. But it becomes less than adequately granular should you wish to start loading up the MARC record with such things as book reviews. Then you are reduced to such questionable tactics as smashing it into a note field. As time goes on, in other words, we may begin to find that MARC isn't quite as extensible or granular as it will need to be.

External compliance

The issue of granularity becomes critical in the apparent slavish devotion we tend to have toward standards. Don't get me wrong. Standards are vital to sharing data with others. They are important to any situation in which you must interoperate with other systems. They are important to providing a method to layer services easily on top of a collection of metadata. But we sometimes confuse internal compliance with external compliance.

External compliance with standards means that you can export your data into whatever metadata standard applies to a given situation. For example, some libraries are involved with the Open Archives Initiative (OAI), which aims to share metadata among working paper archives. Although internally a given archive may have a richer and more granular collection of metadata, OAI specifies that at minimum the archive should be able to make its metadata available for "harvesting" (collecting via software) using the Dublin Core metadata specification. Therefore, an OAI-compliant archive will likely "dumb down" its metadata (in some cases making granular metadata more homogenous) to meet this minimum specification.

Not all metadata are equal

Internal compliance means storing the metadata in a particular standard even when it makes little sense to do so. Some standards are meant to provide interoperability among systems (such as the Dublin Core), while others are designed to provide a base level of standardization upon which software systems can be built (such as MARC).

Therefore, not all metadata standards are created equal. They are sometimes inadequate for your internal needs or would prevent you from complying with a different standard. In the case of the OAI-compliant archive above, for example, being internally compliant with the Dublin Core would make no sense in and of itself. So long as it could "speak" Dublin Core when required, a richer set of internal metadata may allow many other additional uses of the same information (such as MARC records for a library catalog).

Granularity questions

Nearly all metadata standards raise granularity issues. In the TEI example, for greater flexibility the author's name should be chopped up at least into the part of the name upon which sorting can take place (usually the last name). Therefore, should I decide to encode a digitized book using the TEI set of XML tags, I will have metadata that is only adequate for TEI compliance.

On the other hand, should I create my own set of tags—perhaps the TEI tag set plus additional tags, for example, to identify the author's first and last name—to provide more granularity, then a standard such as TEI can be covered like a blanket. And a number of other metadata standards that may be important (such as MARC) can be supported as well. Once you have your metadata stored in a standard, machine-parsable container, whether in a database or an XML data stream, it's easy to spit out the information in various configurations and formats.

Remember: select (or create) and use metadata containers that are granular enough for any purpose to which you can imagine putting them. If you do this, not only can you serve your own purposes, but you can also share your metadata with anyone you wish. If this is not practical, then you must decide which needs will remain unfulfilled.

Highly granular metadata doesn't come cheap. There is a trade-off between all possible uses that you may wish to support and the staff time required to capture the metadata required to do so. In some cases, the benefit will not warrant the cost; in others, it will be worth it.

Another path to granularity

Good granularity doesn't necessarily mean that any single metadata standard or container must chop up every field into the smallest reduceable part. For example, the emerging standard for digital object description, METS, is designed to take advantage of other, more granular metadata containers.

As a wrapper, it is meant to enclose some things and link to others. It can refer to a metadata record for the item being described. Therefore, a digital object described using the METS schema may, in fact, refer to a MARC record for descriptive metadata.

Granularity of metadata is hard-won and easily lost. Identifying and appropriately encoding metadata elements usually requires a person—and one with training. Once granularity has been achieved, it should not be permanently surrendered through internal compliance with an external standard, unless the benefits clearly outweigh the drawbacks and no alternatives are possible.

The time of cataloging staff is valuable, and once granularity is lost it may not be practical to recover it. Our libraries depend on metadata. They are becoming even more dependent as we move into the realm of creating, managing, and preserving collections in digital form. Doing so well requires us to understand thoroughly what is at stake and the consequences of our actions.


Author Information
Roy Tennant (roy.tennant@ucop.edu) is Manager, eScholarship Web & Services Design, California Digital Library. He is founder and manager of the electronic discussion lists Web4Lib and Current Cites

 
Link List

Dublin Core
dublincore.org

ISO 12083
www.xmlxperts.com/12083.htm

MARC
www.loc.gov/marc

METS
www.loc.gov/standards/mets

Open Archives Initiative
www.openarchives.org

Text Encoding Initiative
www.tei-c.org

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
LJ Criticas Review Alert
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites