The LJ Academic Newswire Newsmaker Interview: John Wilkin,Michigan Associate University Librarian and Executive Director of HathiTrust
Andrew Albanese -- Library Journal, 10/16/2008
| Go back to the Academic Newswire for more stories |
In what may be the library community’s most ambitious digital collaboration so far, some two-dozen large research libraries this week announced the launch of a single, shared repository of digital collections, including scanned books, articles, special collections, and a range of “born digital” materials. The venture, called HathiTrust (pronounced HAH-tee), “combines the expertise and resources of some of the nation’s foremost research libraries,” said John Wilkin, associate university librarian of the University of Michigan (UM) and the newly named executive director, and was launched jointly by the12-university Committee on Institutional Cooperation (CIC) and the 11 university libraries of the University of California (UC) system, with UC’s participation coordinated by the California Digital Library (CDL). The University of Virginia has also announced that it will join the venture. LJAN caught up with Wilkin this week for more on the venture, the first in a series of Q&As with HathiTrust partners.
LJAN: HathiTrust represents something librarians have thought—or, dreamed about—since the digital age began. How did this specific initiative get rolling?
JW: You’re right—we’ve been thinking about this sort of thing for years, with specific discussions going back at least to the first Making of America project in 1995. Sometimes the genesis of an idea is hard to trace, but in fact we had very specific discussions regarding this notion of a shared digital repository back in 2004, with Michigan and California beginning to articulate some specific notions. Discussions in the CIC were early, as well, and began to flesh out an approach. But as we began to absorb substantial amounts of digitized content from Google, talks become more focused and urgent. It’s worth pointing out that we have had terrific support in this venture from university leadership, as well in the libraries.
You mention Google—it seems you are both its partner and competitor at once. Can you talk about where your missions diverge and dovetail?
That’s a great question—the primary difference will be in our commitment to long-term preservation of this information and Google’s commitment to access. That said, we will provide some minimal levels of access (for public domain works, etc.), and we will work to identify specific scholarly needs that Google is less likely to serve. For example, data mining and large-scale linguistic computation is more likely to be in our bailiwick than Google’s.
How are you handling what must be some big technology challenges here?
I’d like to think that the technology aspects of an enterprise like this get a little overplayed. Standards like OAIS and PREMIS give a lot of guidance in the way that materials should be handled, and the emergence of the Trustworthy Repositories Audit and Certification checklist has helped focus the development of technology. Of course, a foundation of support for all of this technology is pivotal.
Tell us a bit about the platform’s development, both so far, and as the project matures.
The software development, to date, was done primarily by the University of Michigan, borrowing from many significant open source software development efforts. For example, we use JHOVE for validation of content. The partners intend to shift the work to something more collaborative, like the use of “pairtree” directory structures, developed primarily by CDL, though with some collaboration with Michigan.
Reliable and extensible storage is a key piece of the approach, and our selection of Isilon’s storage was informed by a very collaborative effort among storage experts at Michigan and review by CIO’s in the CIC. We’re very, very fortunate to have Indiana University’s participation in this, and Brad Wheeler has been not just key in helping to take an appropriate path with storage, but has led his very capable IT organization in support of our activities.
Can you ballpark how many librarians will work on HathiTrust’s efforts—and what kinds of challenges will be foremost for them—for example, dealing with digitization and hosting of traditional resources vs. preservation of born digital?
Putting numbers to the staffing of the project is probably an impossible task—for example, technical services staff currently perform work in bibliographic identification, and this work is not distinct from other duties. Similarly, digital library staff members are redirecting effort from previously institution-specific efforts to this shared one. One of the great parts of this story is the way that libraries can make great things happen through work in a collaborative library space. You asked how difficult it is to deal with digitization and hosting of traditional resources while, at the same time, dealing with preservation of born digital stuff? We haven’t gotten to the born digital stuff, yet—and I don’t expect it’ll be easy.
Copyright: the Section 108 working group recently suggested a carve-out for library digitization for preservation, but they couldn’t agree on exactly how that should work. HathiTrust, meanwhile, is following the Nike plan—and just doing it. Were there any tense moments in the General Counsel’s office?
We believe our approach here is fairly conservative. We use existing guidelines, including Section 108. For in-copyright works, except for some specific Section 108 uses and services for users with disabilities, access is limited to searching the text and getting references to pages with “hits.” In the future, we hope the materials we curate will be a significant foundation for the determination of copyright status—see our IMLS grant, for example, supporting the work of several partner institutions.
HathiTrust has been funded for five years: what happens then—can this major effort be sustained?
We should make a distinction between funding and planning—the participating institutions here have always known they would have to spend money to host their digitized content and, by and large, they have identified funding to support this work for the indefinite future. So, in that sense, the initiative is permanently funded. This specific collaboration, however, is something that has never been done at this scale, and it makes a lot of sense to build in requirements for examination and evaluation of the initiative. Hence, the initial commitment is for five years. Before that deadline, we will surely make changes and we expect that participants will renew and extend their financial commitments.
Read more Newswire stories:
LJ Placements & Salaries 2008, Now Online, Finds Library Jobs and Pay Both Up
Copyright "Czar" Included, Bush Signs Controversial PRO IP bill into Law
In "Fair Use" Twist, McCain Campaign Says Its YouTube Videos Victimized by Abusive DMCA Takedowns
Frankfurt Confidential: Google, Publisher Deal To Be Announced Next Week? OverDrive Presents EPUB eBook Lessons; Cherubini Returns to SOLINET
Bestsellers in Physics
Library Journal Seeks Nominations for 2009 Movers & Shakers






