HathiTrust Full-Text Search Bolsters Research Appeal
Library-led Google Book Search alternative improves technology
Josh Hadro -- Library Journal, 12/03/2009
| Go back to the Academic Newswire for more stories |
- 4.6 million volumes now searchable in full-text
- Paving the way for computational research tools
- In contrast to Google project, an emphasis on preservation
Launched a year ago, the HathiTrust, representing 25 research universities, has built up a reputation as something of a library-led alternative to a universal digital library branded by Google. With the recent addition of full-text search across its growing corpus, the HathiTrust is expanding the technology necessary to deliver much of what librarians have come to expect from such an undertaking.
The project debuted a beta catalog search in May which covered only the basic bibliographic records describing the nearly three million items then in its collection. Now, users are able to perform full-text search across the corpus, retrieving result frequency and page number locations, even from within works whose full texts are not visible.
HathiTrust currently boasts more than 4.6 million volumes, equaling some 1.6 billion indexed pages. The combined full-text and bibliographic metadata searching is performed via the open source Apache Solr/Lucene project, much the same technology that drives the VuFind catalog frontend software.
According to the HathiTrust, the new full-text search paves the way for future advanced search options, as well as "tools that can be used in computational research."
Corpus appeal
Both the beefed-up search functionality and the promise of a massive research corpus once again invite direct comparison to the eventual products of the Google Book Search settlement.
The planned computational corpus is highly anticipated by a number of researcher and could open up new lines of quantitative research into the humanities.
Value of preservation
An often overlooked difference between the two is the equal emphasis HathiTrust places on preservation as on access.
Highlighting this distinction between between Google's proposed commercial product and the HathiTrust, Associate University Librarian, University of Michigan, Ann Arbor, and Executive Director of HathiTrust John Wilkin wrote in a January Library Journal essay, "[D]espite Google's ability to provide rich access, there are important things Google cannot do. Most critically, Google cannot be that trust for the future."
Contact the author: josh.hadro@reedbusiness.com
Read more Newswire stories:
Values Remake: An Experiment in Virtual Collaboration
Barbara Jones, Ex-Director at Wesleyan, Named Head of ALA OIF and FTRF
CLIR Funds Documentation of "Hidden" Collections
Universities of Alaska and Miami Receive Substantial Gifts
Columns:
Digital Vampires, Zombie Books | Peer to Peer Review
Today You Are a University | From the Bell Tower
Ebrary offers swine flu Searchable Information Center; Mellon grant funds Univ. of Minnesota's EthicShare site; more details on new IGI platform
Best Sellers in Botany-Zoology







