An ongoing, grant-funded initiative involving the HathiTrust digital repository has been tracking down previously unknown public-domain materials in the repository's vast collection of scanned works from research libraries across the country.
HathiTrust has gathered several new partners in the last few months, including, most recently, Cornell University, as well as Dartmouth College, and the Triangle Research Library Network. Thirty-five research libraries have signed on so far, and most of them have donated digital scans of its materials to the project—nearly 7 million volumes so far—many of which had been digitized via the Google Books project, or similar initiatives. Cornell alone has pledged to add some 300,000 works by March 2011.
Many of the scans are of public-domain materials, such as works published before 1923, which are available for anyone to read via the HathiTrust Digital Library. Many others are copyrighted materials, for which HathiTrust must restrict access.
But, due to the vagaries of U.S. copyright law, some scans' status can be more of a mystery. If a work was published between 1923 and 1963, but the copyright holder didn't renew the copyright after its first 28-year term, it, too, is public domain. Those works should legally be accessible via the HathiTrust as well, but determining a work's copyright status requires research. That's where the Copyright Review Project comes in.
Tracking down statuses The University of Michigan Library—which alone has deposited more than four million scans to the HathiTrust project—was awarded a $578,955 Institute of Museum and Library Services grant (match: $655,898) in 2008 for a three-year project. Its aim: to go through HathiTrust scans of works published between 1923 and 1963 and determine their copyright status.
The project has since expanded to include staff from other institutions, including the University of Minnesota, Indiana University, and the University of Wisconsin-Madison—currently about 20 staffers in all.
According to Anne Karle-Zenith, the Copyright Review Project Librarian at University of Michigan, the project has checked the status of about 95,000 books so far; of those, more than 52,000—greater than half—have been found to be public domain. The project looks at the books most recently deposited into the HathiTrust database, and that's a lot of books: there's a backlog of about 175,000 books currently, Karle-Zenith said.
The books are checked first against Stanford University's online Copyright Renewal Database, which Karle-Zenith said is a much less cumbersome process than other methods of research. (Periodicals are not part of the project, as the Stanford database only contains renewals for monographic books and pamphlets.)
If no copyright renewal is found, it's very likely none exists, though research results are sent to the U.S. Copyright Office to make sure nothing was missed. [See Karle-Zenith's clarification on this process below.] If no renewal is found, than the work is deemed public domain, and its full text is then made available through HathiTrust. (The project's full guidelines are available here [PDF].)
Karle-Zenith said that literature and textbooks appear to be renewed more often than other types of works, but there have been some surprising finds. For example, author Gore Vidal's early novel The Judgment of Paris (1952) was found to be public domain, as well as the 1947 textbook Atomics for the Millions, which contains some of the first published artwork by Maurice Sendak, who would later write and illustrate the children's book Where the Wild Things Are.
Reader Comments (11)
I wish we could get a list (or a record set) of these works; we would like to add bib records to our catalog for the post-1923, open access works.
jeffrey.beall@ucdenver.edu
Posted by jeffrey.beall@ucdenver.edu on October 21, 2010 01:19:29PM
David, thanks for this piece - it's great! I just want to clarify one thing. We do not send all our results to the U.S. Copyright Office. As part of our efforts to evaluate our work, the IMLS grant stipulates that we will engage the Copyright Office to undertake comparison searches of works we have determined to be in the public domain in order to evaluate the accuracy and effectiveness of the Copyright Review process. We sent the first set of volumes for evaluation in early 2010, and recently received results from the Copyright Office. Based on the budget allotted, the Copyright Office was able to evaluate 96 volumes for renewal status. Only four of our determinations were incorrect (i.e., a work we determined was not under copyright was actually renewed/still protected). Further analysis revealed only one of these volumes to be a true miss on the part of our reviewers. While the sample size is small it helped to confirm that our process and its reliance on the Stanford Renewal Database is producing reasonably reliable results.
Posted by Anne Karle-Zenith on October 21, 2010 02:51:14PM
Jeffrey, we do make our rights determinations for all works in HathiTrust publicly available via tab delimited files. Among other data about the works in HathiTrust, these files include rights information for each volume, including the rights attribute (e.g., public domain, in-copyright, etc.) as well as the rights determination reason code (e.g., copyright not renewed, no copyright notice on the piece, etc.).
More information about HathiTrust Data Distribution & APIs and the HathiTrust Metadata is available here:
http://www.hathitrust.org/data
http://www.hathitrust.org/hathifiles_metadata
Posted by Anne Karle-Zenith on October 21, 2010 02:56:44PM
This is tremendously important work. Kudos to the University of Michigan and its partner institutions for putting in the hard work to enable the world to freely access "orphan works".
Posted by Roy Tennant on October 21, 2010 05:27:21PM
Just one more clarification - this project focuses on identifying works in the public domain due to non-conformance with U.S. copyright formalities (i.e., the copyright was not renewed after 28 years, the work was published without a copyright notice). Orphan works are in-copyright works for which the copyright holder cannot be located. We are not identifying/providing access to orphan works as part of this project. (I keep saying I need a memorable word like "orphan," instead of a cumbersome phrase, to describe these works, so people can more easily remember that they are different from orphan works.)
Posted by Anne Karle-Zenith on October 21, 2010 04:06:43PM
Stanford may not have periodical renewals in their database, but they're online (albeit not in an easily searchable form). I have links to them, and other renewal reocrds, at
http://onlinebooks.library.upenn.edu/cce/
As a convenience, I have a list of the first renewal of selected periodicals at
http://onlinebooks.library.upenn.edu/cce/firstperiod.html
This list is complete up to 1950, and selective afterwards; that is, periodicals not appearing in the "first renewal" list did not renew issue copyrights prior to 1950. (However, in some cases individual items appearing in a periodical may be separately copyrighted and renewed.)
Posted by John Mark Ockerbloom on October 21, 2010 07:09:14PM
Can we assume that any book that has "full view" access is in
the public domain?
Posted by S. A. Brannon on October 21, 2010 07:28:28PM
More background on the HathiTrust initiative can be found here:
http://fairuse.stanford.edu/blog/2010/09/rising-into-the-public-domain.html
Douglas Fevens
The University of Wisconsin, Google, & Me--
http://www.facebook.com/douglas.fevens
Posted by Douglas Fevens on October 22, 2010 05:16:36AM
No, you should not assume any book that has "full view" access is in the public domain. For example, some works are available as full text because we have received permission from the copyright holder. For more information on rights management in HathiTrust, see:
http://www.hathitrust.org/rights_management
Posted by Anne Karle-Zenith on October 22, 2010 08:37:46AM
Roy, you've misused the term "orphaned works," which applies to works that are clearly still in copyright but whose owner cannot be located or perhaps identified.
One of the catches I hope the project is looking out for is that many works were first published in magazines, before later being published in books with a new copyright. If those magazines were renewed, then all identical material in the book version is also still under copyright, regardless of the status of the book copyright. (At least that's my non-lawyer's understanding of the situation.)
Posted by M-K on October 22, 2010 01:45:54PM
The Copyright Review Project (and the HathiTrust) are great projects, and it's exciting to see effort and money put into discovering what's in the public domain. There's a lot of sweat of the brow involved, so thanks to those doing this.
I would love to see data gathered from the Copyright Review Project put into OCLC's WorldCat Copyright Evidence Registry. I suspect Roy Tennant was reluctant to promote his company's service directly, but since I have no affiliation to OCLC or others involved, I wanted to mention it here. Maybe there's already a partnership or sharing agreement in place? See: http://www.worldcat.org/copyrightevidence
Posted by Roger Skalbeck on October 26, 2010 02:35:53PM