Recent Posts
Recent Comments
Most Commented On
Archives
Blog
Link This | Email this | Blog This | Comments (6)
Hathi Trust FunAugust 25, 2008 The University of Michigan Library led the way in putting the books online that Google was digitizing from their collection, and to this day remains one of the few (the only?) to have made the effort. They were subsequently joined by the other CIC institutions (Committee on Institutional Cooperation) in a joint effort to mount the texts being digitized from all CIC institutions. The resulting effort is called, oddly enough, the Hathi Trust.The Hathi Trust is building a shared digital repository, and although there isn't any information on the web site yet, there is information at the University of Indiana about the project. In reading that, I noticed that they were offering brief records describing all of the contents of the repository. Since I find it difficult to resist such temptation (as my earlier effort to make the records of the University of Michigan's public domain books searchable will attest), I contacted them and got the records. They come in one compressed tab-delimited file, with a very abbreviated record for each title. Nonetheless, I thought it would still be fun to search it and see what I could find. So in an incredibly misguided implementation, I chopped it into 1.4 million tiny little XML files (yes, I am an idiot), and indexed it with my favorite indexer, Swish-e (which is XML-aware and can thus provide fielded searching as in a database). The result can be seen and used on my prototype server as the Hathi Trust Search. Since this is my prototype server I don't worry about making it pretty or even in making it fully functional, which you will realize if you try to browse past about the first 5,000 hits of any search. So sue me. Meanwhile, have fun. Take a look and see what's there or isn't on your favorite topics. Find out what's considered in copyright or out of it. And let me know, either as a comment here or by personal email anything interesting you find. Posted by Roy Tennant on August 25, 2008 | Comments (6) Industries: News & Features
August 26, 2008
In response to: Hathi Trust Fun Jeffrey Beall commented: The reason the metadata records are brief is because your employer, OCLC, does not allow open access distribution of the full MARC records. So institutions like the University of Michigan have to strip down the records before openly distributing them. In this way OCLC is hindering access to information and hurting digital libraries.
August 27, 2008
In response to: Hathi Trust Fun John Wilkin, AUL for LIT/TAS at U of Mic commented: Jeffrey, OCLC's guidelines are not a factor in this. When we began planning this function with our partners, we struggled with a few questions about record distribution. Two examples may help to illustrate this. (1) In many cases, these records will come from the catalogs of our partner institutions, and the UM version of the record is immediately be out of synch with the source records from which these are derived. (2) Different institutions take different approaches to cataloging electronic resources, and the version at Michigan (a combined print and electronic record) may not be the preference for other institutions. We'll soon release documentation for this service, and those docs will make clear that what we intended here was a mechanism by which a 'consumer' institution could go to any of a number of different sources (including to Michigan's catalog or OCLC) to get full records. The body of content grows by hundreds of thousands of volumes per month, not a small flow for an institution seriously interested in keeping up with record changes. All of that said, we have only now released the mechanism and will continue to assess it with our partners.
September 30, 2008
In response to: Hathi Trust Fun Rockwell commented: Your Hathi search is fantastic! Much easier than the convoluted UM site! Thanks for the effort.
September 30, 2008
In response to: Hathi Trust Fun Roy Tennant commented: Rockwell, thank you for your kind comment. It really didn't take long to put it together at all, and UM/Hathi Trust really should get the credit for making the information available in the first place (they are still the only Google libraries who have).
October 1, 2008
In response to: Hathi Trust Fun Rockwell commented: Yes, I've noticed that U.C hasn't done anything yet with their Google scans.
October 14, 2008
In response to: Hathi Trust Fun bowerbird commented: > The University of Michigan Library
Advertisement
|
Advertisements
|
|
|
|