BackTalk: HathiTrust and the Google Deal
John Wilkin -- Library Journal, 12/23/2008
In 2008, some of the world’s great research libraries created HathiTrust, a unified, comprehensive digital collection of the published record. As the Google settlement winds its way forward, many will surely wonder how HathiTrust will relate to the product Google now proposes to offer. As I see it, the product coming from the Google settlement will be a transformative resource that will offer access to a huge body of content far more quickly than if the case had to make its way through the courts. Yet, despite Google’s ability to provide rich access, there are important things Google cannot do. Most critically, Google cannot be that trust for the future.
Common roots
Although HathiTrust was born of research libraries, it recognizes the potential of the digital environment to transcend the dimensions that constrain our print collections. HathiTrust aims to create nothing short of a universal digital library. Toward that end, the Google settlement both helps to bring the venture into sharper focus and contributes to making it a reality.
While the planning for HathiTrust predates the settlement negotiations, the Google Books Library Project has been integral in seeding HathiTrust with a large body of materials, as well as inspiring a new level of digitization activity by libraries, library consortia, and other partners, such as the Open Content Alliance. Drawing from Google’s efforts with its initial partners alone, HathiTrust is likely to exceed ten million digital volumes in the next few years. Add to this the hundreds of thousands of volumes and collections being digitized through other initiatives, and we can envision a rich, digital resource for future scholarship.
HathiTrust is a single entity, but it represents a groundbreaking partnership of libraries focused on applying our accumulated expertise in digital preservation and management to create a reliable and trustworthy digital repository.
While preserving content is HathiTrust’s first order of business, we also believe that preservation without access is meaningless. We share a commitment to exposing and using content in the repository to the fullest extent possible.
To this end, the proposed Google settlement will also aid HathiTrust. Much of what HathiTrust proposes to do—preserve content, support access by print-disabled users, generate print replacement copies from the digital files when original print copies are damaged or lost, and serve as a body of content for large-scale computational needs—is explicitly sanctioned in the settlement agreement, thus protecting this fundamental library-based effort from legal threats. Some of the early Google digitization agreements did not support these kinds of collaboration-enabling activities. As part of the settlement, revised agreements with Google will now explicitly sanction this collaboration.
Moreover, by creating a single yet distributed architecture, HathiTrust diminishes the security risks that come with an array of small-scale storage efforts. We introduce trustworthy curation and permanence for the cultural record into the mix of large access projects. In this way, HathiTrust should be seen not as a threat to the interests of publishers and authors but as a complement, bringing the great value of libraries to a broad ecology of interests.
Challenges
Despite its auspicious beginnings, HathiTrust will certainly face its share of challenges. Libraries do not perform this kind of deep collaboration naturally or readily, for example, and finding ways to do our work collectively in this space will not be easy. We aim to avoid some of the inefficiencies that were inadvertently carried into the digital environment in our early digital library efforts. Our partners will work together to deal with issues of duplication, for example, and to coordinate and review content in collections. We will also work with one another and OCLC to resolve incidental variation in bibliographic description.
We also have concerns about the quality of digitization. We hope to focus our collective attention on the development of metrics for quality and, ultimately, to certify objects in the repository with regard to quality and completeness.
Perhaps the greatest challenge going forward, however, will be sustaining the archiving role of HathiTrust in the face of objections from within our own community. Some will argue that archiving and curation is not necessary as long as the Google product remains vital. Only libraries, however, can ensure the historical record is protected against distortion, suppression, and loss. As librarians, we must assume responsibility for this role. No matter how pivotal the Google product becomes in the lives of our users, Google cannot protect the historical record and ensure its future for the public. Libraries are responsible for that very role. Along with that comes a responsibility to make materials in our collections as accessible to users as legally possible. This is what HathiTrust does. These are the goals we have set.
John Wilkin is Associate University Librarian, University of Michigan, Ann Arbor, and Executive Director of HathiTrust







