Link This |
Email this |
Blog This |
Comments (3)
Hathi Trust Follow-Up
September 3, 2008
Just over a week ago I blogged about
some fun I was having with metadata downloaded from the Hathi Trust. The Hathi Trust is being established by CIC institutions to be the repository of the content they are mass digitizing. The University of Michigan MBooks platform was being used to do it, and
MBooks has as of today transformed itself into the
Hathi Trust with the release of its new web site today.
The new web site is chock-full of information, including how to download data about the books through either OAI-PMH in MARC21 or unqualified Dublin Core or as abbreviated records in a tab-delimited file. I used the latter format to throw together a
rough search of the data on my prototype site.
I think it's worth noting that from the August 1 data dump to the September 1 version, over 100,000 items had been added (from roughly 1.45 million records to well over 1.5 million, and the total today is over 1.6 million). With a so far somewhat steady 18 percent of these books being fully available in the United States, we're talking about almost 600 open access books being added
per day. Now call me old-fashioned, but that's not chopped liver.
So kudos to the Hathi Trust, the CIC institutions that comprise it, and the good work they are doing to expose this information for the rest of us. They remain a shining example of how to do mass digitization right.
Posted by Roy Tennant on September 3, 2008 | Comments (3)