Login  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine
Email
Print
Reprint
Learn RSS

Digital Libraries: Doing Data Differently

By Roy Tennant -- Library Journal, 6/15/2005

Metadata is often created in a time-consuming process by catalogers, digital library technicians, and others. It is then underused in our systems and the systems of those with whom we share the information. But recent experiments by some library organizations indicate that we are only scratching the surface of what our data can do for us.

The American West Project at the California Digital Library (CDL) is aggregating metadata that describes digital objects related to the American West from a couple of dozen libraries, museums, and archives. When we used the Open Archives Initiative Protocol for Metadata Harvesting to gather these records, we discovered such significant metadata problems I was prompted to write a paper ("Bitter Harvest") as well as a similarly titled column (LJ 7/04, p. 32).

We've normalized

Now, a year later, we have made progress toward developing a set of procedures and software routines to normalize, transform, and enrich these records in new and interesting ways. For example, we discovered that the institutions from which we were obtaining records encoded dates using a wide variety of methods (for examples, see "CDL's OAI Harvesting Infrastructure").

We also discovered that useful dates were not always encoded in date fields within the record. Sometimes dates were embedded within a title field. Dates also appeared in descriptions, subjects, and other fields. We are writing software routines to normalize the dates in these records and match the normalized dates against a set of time periods that relate to the history of the American West. Because of all this processing, the users should have a much richer interaction with the site.

Enriched data

Normalizing is rather straightforward compared with trying to assign broad topic headings to a mass of heterogeneous records with subject headings from different vocabularies or with no headings at all. For this we turned to clustering software. Clustering software attempts to find associations among records that can then be exploited to assign one or more subject headings to appropriate clusters. We do not yet have a production capability for doing this, but early experiments have been encouraging. Meanwhile, the institutions from which we obtained these records are interested in the possibility of getting the enriched records back.

This desire of institutions to receive their enhanced records back is a familiar one to Diane Hillmann and her team at the National Science Digital Library (NSDL). They have long worked to improve the metadata they received from institutions participating in the NSDL and have recently taken that experience further by suggesting that an appropriate and useful role for metadata aggregators is to provide enhanced metadata for others to use (see "Improving Metadata Quality").

A basic finding of both the NSDL and CDL is that metadata enrichment should not be based solely on human-only nor software-only procedures but rather on a mix that uses the strengths of each to its full potential.

Work that metadata

OCLC provides yet another example of how our metadata can do more for us. Lorcan Dempsey, OCLC's VP of research (see "Making Data Work Harder"), pointed out how a significant characteristic of both Google and Amazon is that they squeeze as much work out of their data as they can—all to create more useful and compelling services. He suggests we could learn a lot from them, as well as others that are also making data work harder. OCLC is itself "mining" the rich metadata store of WorldCat in interesting ways that may soon show up as useful new services (see "Works 4 You?").

Another example used frequently in this column is RedLightGreen.com, the bibliographic database tailored for the needs of undergraduate students by the Research Libraries Group. In RedLightGreen, subject headings from the records retrieved by a search are pulled out and put in a prominent location for users to discover easily. Why should this very useful information lie hidden in the metadata and only surface when a user requests to see the full record?

Our rich collections of metadata are underused. Meanwhile, we can employ automated techniques to make our metadata more uniform when it needs to be uniform and richer when it needs to be richer. We must get smarter about metadata in so many ways. Never have the skills of software-savvy catalogers and metadata-savvy software engineers been so essential to the future of our libraries and the users we serve.


Link List
Bitter Harvest
www.cdlib.org/inside/
projects/harvesting/
bitter_harvest.html
CDL'S OAI Harvesting Infrastructure
www.cdlib.org/inside/
projects/harvesting
Improving Metadata Quality
metadata-wg.mannlib.cornell.edu
/forum/?date=2005-04-29
Making Data Work Harder
orweblog.oclc.org/
archives/000535.html
RedLightGreen
Redlightgreen.com
Works 4 You?
orweblog.oclc.org/archives/
000579.html


Author Information
Roy Tennant (roy.tennant@ucop.edu) is User Services Architect, California Digital Library. He is author of Managing the Digital Library (Reed Business Pr., 2004)

Email
Print
Reprint
Learn RSS

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs

  • Roy Tennant
    Tennant: Digital Libraries

    January 7, 2009
    The Semantic Web, Part II: Linked Data
    Yesterday I attacked what I believe to be some somewhat unbelievable predictions for the Semantic We...
    More
  • Roy Tennant
    Tennant: Digital Libraries

    January 6, 2009
    The Semantic Web, Part I: Promises, Promises
    Over eight years ago I called the Resource Description Framework (RDF) "dead on arrival". ...
    More
  • » VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
CRÍTICAS
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites