OCLC Delving Deeper into Digital Preservation Services
Digital registry for master files, digital archive harvesting, and domestic and overseas service centers offered
By Michael Rogers -- Library Journal, 5/15/2003
For the past several years, OCLC has been reinventing itself by extending beyond cataloging into other facets of the library market. Although not an entirely new feature, OCLC's digital preservation services are getting a concerted effort to carve a niche in that growing field.
"Digitization is not preservation," warns Meg Bellinger, VP of OCLC digital and preservation resources. "Digital content is vulnerable to software and hardware obsolescence," and information specialists such as librarians must make long-term commitments to keeping their collections accessible.
Bellinger told LJ that OCLC became involved in digital preservation several years ago by assuming the management of Micrographics Preservation Services of Bethlehem, PA, and has advanced from there. "We had a working group of significant library representatives including Harvard, University of Virginia, Cornell, and Columbia, and we are using the WorldCat database to create a centralized registry for digital masters."
For example, if the Smithsonian or Harvard have books they've digitized for long-term preservation, the institution can create a record in WorldCat. This prevents others from replicating the work while offering access to the content through a URL. "We're in the early stages now and are developing the best practices and seeding the registry with test data. We're also identifying system policy economic changes that would be required to sustain it over the long term," Bellinger said.
Phase 2 will start in late summer 2003 and will include "planning and implementation of some of the changes and a policy evaluation of how we share the content and how it integrates with what's being done elsewhere." Right now the registry is concentrated on printed materials, which comprise the vast majority of WorldCat holdings.
Web archivingOne of the first tools OCLC developed was a web document digital archive harvesting tool. The tool allows libraries to use software to catalog their content and make it available in WorldCat. OCLC also has been working with the Government Printing Office (GPO), which, Bellinger said, had a real problem with what it termed fugitive documents—digital objects created by government offices and placed directly on their web sites, completely bypassing the GPO. "The GPO needed a tool that would help to identify these objects, catalog them so they would have descriptive data, and create a URL link to the content," stated Bellinger. State libraries with repository requirements were having a similar problem.
OCLC also found that organizations like the Library of Congress (LC) are interested in the web harvester not only for preservation but also as a collection development tool. "It's happening more and more in libraries," said Bellinger. "The processes of collection development, acquisitions, and preservation are becoming all in one, especially when it comes to digital content." LC is using the tool to search the web and selectively identify, copy, and add content to its digital archive for preservation.
OCLC's next step is to look at entire web sites. "There seems to be a significant movement in the library community to preserve whole sites of content," said Bellinger. "We think we have a role in developing web crawlers that also will allow librarians to be selective and make judgments but also apply the metadata to discover and maintain that content in the long term. That is the focus of our digital archive development in the next fiscal year." Part of that, Bellinger said, also includes archiving and preserving e-journals. Libraries need recourse if content is discontinued, so OCLC is looking to offer that service to organizations.
Service centersA core part of the digital preservation program are three service centers, located in Bethlehem, PA, Washington State, and a European branch in The Hague, the Netherlands. The service centers perform a huge volume of preservation microfilming and an increasing amount of digitization for libraries. Along with the digitization, the service centers offer content-enrichment services. "In 1995 when we started, people were thrilled just to get a digital picture, but now they're looking for a lot more functionality, including text searching," said Bellinger. "So we're offering optical character recognition (OCR) services and XML tagging and coding."
OCLC asserts that the centers provide a regional outsourcing capability so that libraries, archives, and museums aren't duplicating infrastructure and, especially, the capital investment required to purchase the equipment to digitize research collections. "At the preservation resources at the Bethlehem facility," said Bellinger, "we have two high-production microfilm scanners—these are $85,0000 machines—that are networked to major servers that can digitize hundreds of rolls a week.
"We have a very high-end digital camera that is used for extremely content-rich graphical material. We have book scanners and other equipment so that if we get a book project and it can be unbound, we can feed it through a scanner," continued Bellinger. "If the book has a valuable binding or is fragile, we can put it on a different scanner with a book cradle attached. If we need to get a 24-bit, 600 dpi image of some of the illustrations, we have equipment for that, too. Most libraries don't have the ability to build that kind of infrastructure." The OCLC process can also microfilm historic newspapers, digitize the microfilm, and automatically segment and apply OCR. This results in a fully searchable, XML-tagged newspaper.
The Eagle has landedOCLC recently completed a project for the Brooklyn Public Library to create a web archive of the Brooklyn Eagle, a newspaper that ceased production in the 1950s and for which Walt Whitman was a columnist. OCLC also is doing a project with the British Library, which has a huge number of historical newspapers. "Microfilmed and digitized newspapers can be a rich, valuable, and never-ending source of historic research," Bellinger said. "Now you can search across newspapers, perform keyword searching, and isolate illustrations. It's a really rich tool."


















