Advertisement
Articles

HathiTrust/Summon Deal Increases Search Access to In-Copyright Works

E-Mail This Link


Enter recipient's e-mail:


Close
Email
Print |
RSS |
Share | |
By Josh Hadro Mar 28, 2011

With the Google Books corpus on hold, many have turned their attention to other possible venues of research access for students and scholars, including the HathiTrust digital archive. Initially built upon a foundation of Google book scans, HathiTrust (based at the University of Michigan, Ann Arbor) has grown to encompass more than eight million volumes from a variety of sources. Now, a partnership with Serials Solutions (a ProQuest business unit) presents a new option for academic libraries seeking to give researchers an entre into a massive collection of research.

The Summon integration is set to go live this summer, and will allow subscribing institutions to link their print holdings to the heavy-duty search indexing that's been done by the HathiTrust for works in its collection. Asked how tight the integration between the two would be, Michael Gersch, ProQuest Senior Vice President and General Manager, Serials Solutions, told LJ a Hathi-enabled Summon setup "will search the full text of HathiTrust volumes and point users to the correct place for their institution. That could be a book on the shelf, an ebook through a vendor such as ebrary, or an open access source (freely available in the case of public domain works) version such as Hathi itself or Project Gutenberg."

The HathiTrust corpus is already searchable via two interfaces, including a WorldCat prototype built by OCLC, and the native HathiTrust interface, slated to be upgraded this summer as well. With a Summon implementation, librarians can also choose to include the entire HathiTrust search index, or just the public domain materials.

Orphan work access
Most recently published works are already featured on digital platforms, and many have struck agreements with search providers like Serials Solutions to bolster search access through other commercial search interfaces. Public domain titles, by contrast, are already searchable and viewable in full-text via the Google Books interface, among others. One of the long-missing pieces of the puzzle, however, has been deep search access to the class of materials known as orphan works—titles out of print but still under copyright.

As John Wilkin, executive director of HathiTrust and associate university librarian at the University of Michigan, told LJ, "orphan works have certainly been the major topic of conversation in the aftermath of the failure of the settlement." However, while acknowledged as a primary concern, the full extent of the orphan works issue is itself only partially understood.

During the course of the settlement fairness hearings, Google reps have described orphan works as little more than "a phrase used as a political football," while former R.R. Bowker president Michael Cairns suggested in September 2009 that the total number of U.S. orphan works was on the order of 580,000. However, Wilkin believes most estimates so far have been too conservative. In a recently published report titled "Bibliographic Indeterminacy and the Scale of Problems and Opportunities of 'Rights' in Digital Collection Building," he sets out in to revise the best orphan works estimate based on data from the HathiTrust collection and the IMLS-funded copyright sleuthing efforts at the University of Michigan.

hathiorphanworks(Original Import)Noting that his initial paper is only a foray, Wilkin said, "I put some hypotheses out there to be tested, and found that of the entire corpus of published [world] literature, 50% of them are orphans." That 50% is based on a series of estimations, proposing that roughly 12.6% of orphan works were published between 1923-1963, 13.6% from 1964-1977, and 23.8% from 1978 and after.

Wilkin, cautioned, however, that "the numbers add up, but whether they're real or not depends entirely on analysis that nobody has done." Still, Wilkin believes there to be more than 2.5 million orphan works among the archive's current holdings and that the proportion of orphan works is likely to far outstrip the amount of public domain material in the archive, currently around 2.2 million items, or 26% of the collection, as more in-copyright works are scanned and indexed. If the true number of orphan works is anywhere near Wilkin's estimate, that's likely to make increased access to those works an even higher priority for librarians, as the likelihood of a licensable orphan works database evaporates with the Google settlement.




Reader Comments (0)


Previous | Next

Comments that include profanity, personal attacks, or antisocial behavior such as "spamming", "trolling", or any other inappropriate material will be removed from the site. We will take steps to block users who violate any of our terms of use. You are fully responsible for the content you post. All comments must comply with the Terms and Conditions of this site and by submitting comments you confirm your agreement to these Terms and Conditions.

Your name: *

Your email address: * (We won't publish this.)



* = Required information


 

Welcome the LJ Archives.

This archive site is the home to all LJ articles published prior to January 2012;
Advertisement

LJ Reviews Database

LJ Reviews Center

Latest Stories



From the Blogs



Advertisement

Advertisement

Connect with Library Journal


Follow on Twitter








About Us | Advertising Information | Submissions | Site Map | Contact Us | RSS | Subscriptions
©2011 Media Source, Inc., All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Media Source Inc. Media Source Inc. Media Source Inc. Media Source Inc. Media Source Inc. Media Source Inc.