HathiTrust/Summon Deal Increases Search Access to In-Copyright Works
By Josh Hadro Mar 28, 2011With the Google Books corpus on hold, many have turned their attention to other possible venues of research access for students and scholars, including the HathiTrust digital archive. Initially built upon a foundation of Google book scans, HathiTrust (based at the University of Michigan, Ann Arbor) has grown to encompass more than eight million volumes from a variety of sources. Now, a partnership with Serials Solutions (a ProQuest business unit) presents a new option for academic libraries seeking to give researchers an entre into a massive collection of research.
The Summon integration is set to go live this summer, and will allow subscribing institutions to link their print holdings to the heavy-duty search indexing that's been done by the HathiTrust for works in its collection. Asked how tight the integration between the two would be, Michael Gersch, ProQuest Senior Vice President and General Manager, Serials Solutions, told LJ a Hathi-enabled Summon setup "will search the full text of HathiTrust volumes and point users to the correct place for their institution. That could be a book on the shelf, an ebook through a vendor such as ebrary, or an open access source (freely available in the case of public domain works) version such as Hathi itself or Project Gutenberg."
The HathiTrust corpus is already searchable via two interfaces, including a WorldCat prototype built by OCLC, and the native HathiTrust interface, slated to be upgraded this summer as well. With a Summon implementation, librarians can also choose to include the entire HathiTrust search index, or just the public domain materials.
Orphan work access
Most recently published works are already featured on digital platforms, and many have struck agreements with search providers like Serials Solutions to bolster search access through other commercial search interfaces. Public domain titles, by contrast, are already searchable and viewable in full-text via the Google Books interface, among others. One of the long-missing pieces of the puzzle, however, has been deep search access to the class of materials known as orphan works—titles out of print but still under copyright.
As John Wilkin, executive director of HathiTrust and associate university librarian at the University of Michigan, told LJ, "orphan works have certainly been the major topic of conversation in the aftermath of the failure of the settlement." However, while acknowledged as a primary concern, the full extent of the orphan works issue is itself only partially understood.
During the course of the settlement fairness hearings, Google reps have described orphan works as little more than "a phrase used as a political football," while former R.R. Bowker president Michael Cairns suggested in September 2009 that the total number of U.S. orphan works was on the order of 580,000. However, Wilkin believes most estimates so far have been too conservative. In a recently published report titled "Bibliographic Indeterminacy and the Scale of Problems and Opportunities of 'Rights' in Digital Collection Building," he sets out in to revise the best orphan works estimate based on data from the HathiTrust collection and the IMLS-funded copyright sleuthing efforts at the University of Michigan.
Noting that his initial paper is only a foray, Wilkin said, "I put some hypotheses out there to be tested, and found that of the entire corpus of published [world] literature, 50% of them are orphans." That 50% is based on a series of estimations, proposing that roughly 12.6% of orphan works were published between 1923-1963, 13.6% from 1964-1977, and 23.8% from 1978 and after.
Wilkin, cautioned, however, that "the numbers add up, but whether they're real or not depends entirely on analysis that nobody has done." Still, Wilkin believes there to be more than 2.5 million orphan works among the archive's current holdings and that the proportion of orphan works is likely to far outstrip the amount of public domain material in the archive, currently around 2.2 million items, or 26% of the collection, as more in-copyright works are scanned and indexed. If the true number of orphan works is anywhere near Wilkin's estimate, that's likely to make increased access to those works an even higher priority for librarians, as the likelihood of a licensable orphan works database evaporates with the Google settlement.







