The Art and Science of Digital Bibliography
Roy Tennant -- Library Journal, 10/15/1998
For decades, librarians have been cataloging books and journals for their libraries' collections. So it's only natural that some who embrace the Internet talk about "cataloging the net." They envision creating MARC records for each Internet resource, complete with Library of Congress Subject Headings (LCSH) or Dewey Decimal Classification, and merging them into their catalogs of library holdings. Norman Oder's recent article ("Cataloging the Net: Can We Do It?" Library Journal, LJ 10/1/98, p. 47-51) describes one such project from OCLC called InterCat. But let's take a look at the cataloging process for a moment. Modern library cataloging is characterized by the following:- A proprietary (nonweb) automated system for creating and editing complex records in MARC format;
- A highly trained individual (one of a small subset of librarians) who knows cataloging practices in general and AACR2 specifically; and
- A significant amount of time to create each record.
- A set of guidelines specific to Internet resources that help catalogers to do such things as locate information that is normally on the title page verso when there is no such thing;
- A way to automatically check links; and,
- Catalogers who know Internet resources and capable on the web.
We also have trouble tailoring such a system, either by providing narrative descriptions of the resources or expanding access by offering uncontrolled keywords. Catalogers already find it tough to locate LCSH that are both appropriate to the topic and what a searcher might use. When Yahoo is your competition, relying on LCSH for topic access is ludicrous. Then we also must face how to proceed with the records -- should we load them into our library catalogs, merged with records for items on our shelves? Or do we keep them in a separate database?
The problem is that we have the wrong metaphor.
Instead of cataloging the net we should be practicing digital bibliography. Bibliography is the art of finding, reviewing, selecting, and annotating important information resources on a particular topic, often for a particular audience. The resulting collections have in the past been published in article or book form, while others were distributed more informally (e.g., handouts). But no one would question the utility of this service or that it is central to libraries.
Now some librarians are using their tried-and-true bibliographic skills in new and interesting ways, many of which are chronicled in Oder's article. Sometimes these bibliographies are called "indexes" and or "subject gateways." But they all share some characteristics that distinguish them from cataloging projects:
- Although adeptness in evaluating resources is important, no specialized cataloging skill is required;
- Annotations, either descriptive or evaluative, can be added;
- Periodic link checking is a standard part of the system;
- Such projects can be tailored for a particular audience or purpose; and,
- Records can be enhanced with additional vocabularies or uncontrolled keywords.
| LINK LIST | |
| "The Access Catalogue Gateway to Resources" http://www.ariadne.ac.uk/issue15/ main/ | |
| CrossROADS http://www.ukoln.ac.uk/metadata/ roads/crossroads/ | |
| "A Distributed Architecture for Resourse Discovery Using Metadata" http://www.dlib.org/dlib/june98/ scout/06roszkowski.html | |
| Dublin Core http://www.purl.org/metadata/ dublin_core | |
| INFOMINE http://lib-www.ucr.edu/ | |
| Internet Scout Project http://scout.cs.wisc.edu/scout/ | |
| Project Isaac http://www.scout.cs.wisc.edu/ scout/research/ | |
| ROADS http://www.roads.lut.ac.uk/ |
All ROADS lead to Bath
The largest, most organized, and most technically sophisticated effort is ROADS, the Resource Organisation and Discovery in Subject-based Service project of the eLib Programme of the U.K. Office of Library and Information Networking (UKOLN, located at the University of Bath). The project aims to build standards-based software for creating subject indexes or "gateways" to Internet resources, to investigate methods of interoperability between gateways, and to participate in the development of standards for the indexing, cataloging (in a simplified, non-MARC/AACR2 fashion), and searching of subject-specific resources.
ROADS uses templates (record formats) for different resource types, such as documents, datasets, images, etc. These templates specify the appropriate fields for recording information about a resource and thereby ensure interoperability between separate subject gateways. These templates form the foundation record standard that enables searches across gateways to function the same way.
A set of modules written in Perl provide web-based interfaces, which allow individuals anywhere to participate in creating and editing records if they have Internet access and the proper authorization.
Now some 16 subject gateways -- mostly in the UK -- use the ROADS infrastructure, covering such diverse subject areas as art and architecture, engineering, literature, and medicine. Most have fewer than 3000 records, and since most are independent operations, their depth and quality may differ. Also, the uncertainty of continued eLib funding may threaten some gateways.
To experience the high level of interoperability achieved by ROADS, try out its "CrossROADS" service. This searches the ROADS Index Server, which has index information from seven separate (and often physically distant) subject gateways maintained by different organizations and individuals. Because each subject gateway uses the same software and record formats, searching across these collections is easy.
This allows users to experience the best of both worlds -- the capacity to focus their Internet searching within a particular subject area by searching one gateway, or "broadcasting" a search across a federation of subject gateways. And in all cases, rather than searching a pile of undifferentiated web pages, as you would when using commercial web search engines, you are searching resources that have been deemed worthwhile. That, after all, is what bibliography is all about -- digital or otherwise.
Isaac Scouts out new territory
The Internet Scout Project has for several years produced the Internet Scout Report, a high-quality current awareness service and database of Internet resources. Now Scout proposes to broker a federation of academic-oriented subject gateways. Project Isaac will include no more than six gateways with collections of 5000 to 20,000 resources. Although the project will employ the Dublin Core draft metadata standard, use of that draft standard is not required for participation. Those projects that already use Dublin Core will find that their records will map directly to Isaac records, which are simpler and less thorough than ROADS templates (which themselves are less complex than MARC records).
Internet Scout staff looked at ROADS as a way to implement Isaac but decided instead to build an infrastructure based on the Lightweight Directory Access Protocol (LDAP) for searching and indexing and the Common Indexing Protocol (CIP) to distribute queries, return results, and exchange index information. They assert that this offers a more flexible and extensible data model as well as better industry support for the base protocol. For more information, see "A Distributed Architecture for Resource Discovery Using Metadata" (D-Lib Magazine).
However, Scout plans to support searching of ROADS collections by providing a "protocol gateway" in addition to the main project. Such a gateway provides transparent searching of a different system by translating the original query into something the remote system can understand. So Isaac aims not only to federate existing subject indexes that now don't communicate but also to encompass the ROADS-based gateways. This has great potential.
Mining the Internet
INFOMINE began at the University of California, Riverside, several years ago, and now lists about 14,000 resources of interest to an academic audience. Although it has broader participation among UC librarians, it began as a project of a few UC-Riverside Library staff, and most contributions still come from the key implementors.
As subject indexes go, it might be considered a prime example of what can be accomplished with an idea and commitment. To this day -- although that may soon change -- it runs on a PC. INFOMINE records are Dublin Core compliant; the records can be output using Dublin Core elements, which makes interoperability with other indexes easier. For subject terms, INFOMINE uses "modified" LCSH and uncontrolled keywords.
Version 3 of the service is due out soon, which will allow users to search across the entire database -- now they must first select one of ten divisions -- and allow the database to be independent of any particular database management system.
INFOMINE recently received a three-year, $300,000 grant from the U.S. Dept. of Education to create a limited-area search engine. This would "crawl" the 14,000 InfoMine URLs to provide full-text searching of those selected resources, a service ROADS does not yet offer.
Bringing it all together
In our well-intentioned effort to create one interface to all information resources appropriate to our various clienteles, librarians have a strong tendency to cram everything into their library catalogs. Again, we have the wrong model.
For example, a user may wish to search a library catalog, a subject index of Internet resources, and descriptions of local CD-ROM databases to explore a particular subject area. With the appropriate interface, all these could be searched simultaneously and the results presented in a manner that allows further investigation.
For a more complete description of this model, see Terry Hanson's "The Access Catalogue Gateway to Resources." This model would allow designers of each component to build on its strengths individually and tailor its service to particular needs but also support combined searching and display.
As another example, consider the various ROADS indexes. Someone seeking information about AIDS might want to search not only the social science gateway but the one concerned with health and medicine and probably others as well. The key to supporting customization for an audience and also cross-disciplinary searches is the appropriate infrastructure. And that's not the traditional library catalog.
Roy Tennant (


















