The Repository Adventure
On the way to changing scholarly communication, libraries may end up changing themselves
By H. Frank Cervone -- Library Journal, 6/1/2004
Viewed by some as a means to accelerate change in scholarly communication, digital repositories are also pushing libraries into uncharted territory. Developing these institutional repositories—software and services that manage and disseminate digital materials for an entire institution—is a new priority. The librarians leading this charge work in innovative ways, as part of collaborative teams that can include information technologists, archivists, records managers, faculty, and university administrators, as well as local government officials and community members.
The digital repository genesis has been short, beginning in late 2000 when the UK's University of Southampton released a software package called EPrints. Since then, the movement to establish digital repositories has gained momentum, spurred by a convergence of dropping costs for online storage, the proliferation of broadband and gigabit networking technologies, and the development of metadata standards to describe repository content.
Implementing an institutional repository raises complex questions about organizational resources and strategies, as well as questions about roles and responsibilities. After all, many institutional repository projects are motivated by the desire to change the current model of scholarly communication. This change, if successful, would place the responsibility for publishing material on scholarly institutions, taking the commercial publishers largely out of the picture.
Whether this happens or not remains to be seen, but librarians are especially qualified to assume these new tasks and understand many of the complex issues involved.
What's insideUniversities and other research institutions throughout the world are actively planning and implementing institutional repositories (see "Best of the Bunch," p. 45). Not all of these institutional repositories have the same goals, but they do share similar aims. Steven Harnad, professor of cognitive science at the University of Southampton, has outlined five broad aims of institutional repositories. These include the self-archiving of institutional research such as article preprints and postprints, theses, and dissertations; management of digital collections; preservation of digital materials; housing of teaching materials; and electronic publishing of journals and books.
The seemingly subtle differences among these aims are complicated because of the variety of published materials. For example, in many repositories, much of the data is grey literature, e.g., reports, brochures, guides, product information, budgetary data, memoranda, and unpublished research findings (much from organizations that are not primarily publishers). Add to this mix faculty and researcher scholarship such as original art, grant proposals, maps, radio/TV interviews, motion pictures, music scores, photographs, consulting (technical) reports, technical drawings, and poster session displays, and it is easy to see why institutional repository activity is proliferating.
Different software, different goalsGiven this wide variety of material and goals, it is obvious why so many different software systems have been developed. But not all of these differences are driven solely by the various types of content repositories choose to collect.
A major difference among institutional repository solutions is whether the software is commercial or open source. Commercial software is purchased and, in most cases, is not open to local modification. The programming code of open source software is freely available for users to look at and modify. In most cases the software itself is provided without a licensing fee; the Linux operating system is probably the best-known example.
Several products have gained fame in the open source arena. EPrints, the oldest, has the largest and most diverse installed base. Available from the Joint Information Systems Committee in the UK and the National Science Foundation in the United States, this system focuses on traditional text-based scholarship in the form of preprints and postprints. Publishing in the EPrints model follows traditional print publishing.
DSpace, a joint program between the MIT Library and Hewlett-Packard, is a general-purpose repository designed to capture the intellectual output of research organizations. Unlike EPrints, DSpace allows for a wide range of digital material types.
The Fedora (Flexible Extensible Digital Object and Repository Architecture) management system was designed as a foundation for full-featured institutional repositories and other interoperable web-based digital libraries. Unlike EPrints and DSpace, it does not come with a user interface out of the box. However, similar to DSpace, Fedora allows for a wide range of digital material types. The University of Virginia and Cornell University are jointly developing it.
Greenstone is a suite of software for building and distributing digital library collections. Produced by the New Zealand Digital Library Project, Greenstone is developed and distributed in cooperation with UNESCO and the Human Info NGO.
The Netherlands Institute for Scientific Information Services' i-Tor implements a data-independent repository. The content and the user interface function as independent parts because i-Tor publishes data from a wide variety of relational database systems, file system types, and even from web sites. In a unique feature, i-Tor imposes a repository on top of an existing set of disparate digital objects.
Commercial software providers have been active in this area as well (see "The Commercial Solutions," p. 46). Commercial repository products tend to have closer interoperability with traditional library services, such as including cataloging modules. Additionally, most commercial software tends to have more fully developed modules for access, authorization, and rights management—although this is changing quickly.
The preservation imperativeDigital preservation is one reason institutional repositories are such a hot topic. The advantage of repository software is that it provides mechanisms to consistently identify material to simplify future migration activities. The goal is to have the repository software perform migrations in a seamless manner as the technology changes. Without a repository structure, migrating digital material quickly becomes a nightmare.
The burden of collecting and preserving materials for future access does not normally concern the scholar or creator directly. It is the problem of publishers and libraries, which have preserved materials for many years. Nonlibrarians seldom know of this work unless a resource is lost or damaged.
With digital material, the model changes. Digital preservation depends on good stewardship from the time an object is created. Additionally, digital preservation raises unique challenges. A recording media such as tape or CD-ROM is vulnerable to deterioration more quickly, even under ideal conditions, than traditional paper. An even more difficult challenge is providing "future-proof" compatible retrieval and playback technologies for digital material. Repositories can help do this.
Planning for migrationMost librarians and archivists accept that digital preservation depends upon migrating material to make it compatible with newer technologies. But migration is far more complex that just transferring a bit stream from one media to another. The internal structure and content of the material must be preserved and transferred as well, so that the "new" object represents a faithful representation of the original.
Digital objects are frequently complex, composed of heterogeneous types that are open ended and resistant to closure. What a scholar prizes in an object may be in conflict with preservation methods. This makes for tension between scholars, who want to have "living and breathing" digital media that come and go as needed, and preservationists, who want an object that is stable for infinite use.
Planning for preservation is tricky. It is difficult to predict the many critical aspects of the preservation puzzle, from when to migrate to how much reformatting is necessary to just how expensive it will be. Preservation must be integral to the planning, design, and budgetary process for repositories if institutions don't want commitments to exceed resources.
It all comes back to standardsStandards and protocols are needed to ensure ongoing access to information. For example, the Open Archival Information System (OAIS) Reference Model, the de facto standard for digital archive architecture, provides the framework within which preservation metadata and other standards can be developed. The OAIS model is predicated on capturing content as bit streams that can then be preserved in perpetuity. Additionally, repository software can export archive metadata in an XML-structured format that facilitates migration to another system.
Identifying and locating content consistently is critical in repository software. To facilitate long-term access, each object should have a unique and persistent reference identifier that transcends the software itself. Persistent identifiers would remain valid even if the content migrates to a new system or if the management responsibility of the institutional repository changes.
To facilitate locating content, the Open Archives Initiative (OAI) has emerged. This collaborative effort develops and promotes standards and solutions such as the OAI Protocol for Metadata Harvesting, which allows an institution to create descriptive metadata (metadata that describes the stored content) and make it available to others who wish to use it.
Resistance and fearsAdministrators, faculty, and others may resist institutional repositories. Library administrators may fear an increased workload for staff—and rightly so. Faculty may fear the undermining of the current scholarly publishing system, a critical component in the engine of academic achievement.
As producers of primary research, it seem natural that academic institutions would take an interest in capturing, disseminating, and preserving the intellectual output of their faculty, students, and staff. Traditionally, publishers and libraries have served complementary roles in facilitating scholarly publication and preservation. Over the last several decades, the rate of change to the market and technological infrastructures has accelerated, driven in part by the volume of research, particularly in the sciences.
This accreditation is disrupting the symbiotic publisher/library relationship. Combined with nearly ubiquitous networking and skyrocketing prices in traditional publishing models, it sets the stage for new expectations. Among these is the use of institutional repositories to provide teaching faculty with ways to create and preserve learning objects such as illustrations, visualizations, models, and video.
The process is not without hurdles. Faculty are troubled about who owns the materials they put into a repository. They are also concerned that adding material to an institutional repository can impede other, more traditional publication. Another worry is the perception that repository content, which may not go through peer-review, is of lower quality than traditional scholarly journals.
Future bright, if not sunnyThough most repository efforts are in academia, public libraries are starting to develop them as well. They are also joining local governmental agencies, historical societies, museums, and other cultural institutions to establish community repositories. It is likely that consortial repositories will develop, since not every academic institution will need or want to run a repository.
Institutional repositories significantly extend the role of a library. Such projects are a serious and long-lasting commitment, with extensive benefits as well. Scholars and researchers who rely on an institutional repository to publish and preserve their work place an incredible amount of trust in the integrity, wisdom, and competence of those who manage it.
If libraries step up to the plate, they will fundamentally transform their role from passive transfer agents of information into active partners in the dissemination process. By leading the way in the implementation of institutional repositories, librarians can guarantee future relevance as digital publishing technologies change the structure, if not the nature, of scholarly communication.
| Author Information |
| H. Frank Cervone is Assistant University Librarian for Information Technology, Northwestern University, Evanston, IL |
|

















