Digital Libraries- Different Paths to Interoperability
by Roy Tennant -- Library Journal, 2/15/2001
In a previous column, I discussed the importance of interoperability among digital library projects ('Interoperability: The Holy Grail ,' LJ 7/98). Users should be able to discover through one search what digital objects are freely available from a variety of collections, rather than having to search each collection individually.
|
| |
| American Memory http://memory.loc.gov | |
| Blue Angel Technologies http://www.blueangeltech.com | |
| Dublin Core http://purl.org/dc | |
| LC/Ameritech Collections Online http://memory.loc.gov/ ammem/award/online.html | |
| LC Core Metadata Elements http://lcweb.loc.gov/standards/ metadata.html | |
| Picture Australia http://www.pictureaustralia.org | |
| Picture Australia Metadata Guidelines http://www.pictureaustralia.org/ metadata.html |
In a more recent column, I highlighted a project that is achieving interoperability among preprint (or, as they are now commonly referred to, e-print) servers ('Open Archives: A Key Convergence ,' LJ 2/15/00). For digitized library materials, there are at least two good examples of projects that are achieving the same goal through similar but intriguingly different means.
The LC model
For three years
(1996-99) the Library of Congress (LC) and Ameritech teamed up to offer
digitization grants (of up to $75,000 each) to libraries in the United States.
LC required successful grantees to provide suitable access aids for the items
digitized with award money. These access aids could be in one or more formats:
1) U.S. MARC records, 2) Dublin Core records (following LC guidelines for
usage), 3) structured headers (encoded in Text Encoding Initiative format) for
searchable text reproductions, and/or 4) Encoded Archival Description finding
aids.
Awardees were required to supply LC with the records for the items digitized. These records were added to the LC American Memory collection, thereby providing one place to search the digital collections of LC as well as those of all libraries receiving LC/Ameritech awards. The digitized items themselves remain at the individual institutions, as do copies of the item records.
This highly centralized model for creating a union catalog was possible because LC and Ameritech controlled the funding and could establish record creation guidelines before digitization occurred, therefore providing for a high level of interoperability with records among different institutions.
This model also requires a high level of commitment and precoordination among participating institutions and a willingness from all participants to follow set guidelines. These collections have been incorporated so seamlessly into the existing American Memory collections that users can easily be unaware that they are searching non-LC collections.
Taking this work a step further, LC is developing a 'core set of metadata elements to be used in the development, testing, and implementation of multiple repositories.' This work should be particularly helpful for digital library projects that are looking to contribute records to a union catalog -- either now or in the future.
The Picture Australia model
In contrast to the LC model, the Picture Australia
project came about after a good deal of library content -- nearly 500,000 items
-- had been digitized and cataloged. Picture Australia aims to bring together
access to digitized images relating to Australia from several institutions
(currently seven, including libraries, the National Archives, and the Australian
War Memorial). The particular challenges dictated a more flexible solution than
that chosen by LC.
Since records had already been created for digitized materials, Picture Australia needed a method to collect the records, massage them into a common record format, index them, and make them available for web searching. Rather than requiring participating institutions to ship data periodically to a central location (the National Library of Australia serves as the lead institution), project developers decided to collect the records monthly by using a software spider. This allows institutions simply to put their records in a specific location on their servers, to be collected automatically.
The collected records must then be translated into a common record format (fields are based on the Dublin Core and the storage format is XML) and indexed (using Blue Angelâ?Ts Metastar Enterprise). Most of the issues remaining for Picture Australia relate to this translation of heterogeneous metadata into a common set of elements.
One problem is the loss of context. As Debbie Campell, the Picture Australia project manager, puts it, 'A collection of images may have a collective title such as â?~Images of Paul Revere.â?T But the image title may be reduced to â?~On a horse.â?T So the loss of context becomes a discovery issue.'
Mounting challenges
There is also the problem of differing subject vocabularies, particularly between libraries and museums. The use of geographic names without qualification (such as the name of the state in which it is found) can be problematic as well for those not familiar with Australian geography.
The cataloging problems can go deeper, depending on how the participating institutions have cataloged their materials. A key issue is granularity. Whereas one institution may keep track of first and last names, for example, another may not. Differing formats can be another issue. One library may keep track of dates as MM/DD/YY, while another spells out the month and year. These are issues that must be rectified when translating contributed records into a common format. To see examples that illustrate some of these record variations, see the Picture Australia Metadata Guidelines.
Despite these challenges, Picture Australia is clearly successful in its effort to bring together access to a wide range of pictorial material in one, easy-to-use location. This success rests on several factors. According to project manager Campbell, one factor was a forgiving timeframe. Although each project task was estimated and delivered according to a schedule, there was no overall deadline for release. This allowed some flexibility in reacting to unforeseen problems.
Another factor was the low threshold for participation. Institutions contributing records were required to do very little to make their records available to Picture Australia. 'Picture Australia is quickly able to repurpose the investment already made in digitization and description,' Campbell said.
The Picture Australia model has another advantage. It has its own brand identity, independent of any single institution. This encourages contributors to participate more equally than is possible when assigning records to a single institution, as with the LC model.
Pick a model, any model
Union catalogs are a good thing. They make accessible
from one location what was formerly only accessible by visiting multiple
locations and often by learning different search interfaces. Our users need more
union catalogs.
There is no 'best' model. You use what is appropriate. If you are beginning a project that provides you with the opportunity to lay out guidelines ahead of time, by all means do so -- it will save time and trouble later. But many great chances for creating union catalogs will come after records have been created. The best thing about Picture Australia is that the project has proved that not only can union catalogs be created after the fact, but that they can be done well.
Roy Tennant (


















