ProQuest, Google To Digitize News
Millions of pages to be scanned for Google News archives, including newspapers unlikely to be digitized otherwise
Edited by Josh Hadro -- Library Journal, 10/1/2008
Hoping to do for newspapers what Google Books has done for monographs, ProQuest and search giant Google have reached an agreement to digitize millions of pages of content from ProQuest's vast newspaper microfilm archives, which include 10,000 titles. While ProQuest has vowed to continue improving and expanding its Historical Newspapers collection independently, the Google deal aims to create searchable electronic versions of smaller newspapers otherwise unlikely to be digitized, making them available on the open web via Google's News archive search. “The problem is that, until now, finding a workable economic model for libraries and publishers has been challenging,” said Rod Gauvin, ProQuest senior VP of publishing. “This model overcomes that hurdle, unlocking a wealth of content for libraries and Internet users with unique research needs.”
Google is underwriting digitization costs—which have not been detailed—in return for revenue based on ads displayed alongside the newspaper page images. Digitization has begun with the content ProQuest has full rights to digitize and make available online, including mostly orphaned publications and those in the public domain. For newspapers in the ProQuest archives still bound by copyright, Google and ProQuest execs say they hope to work with copyright owners to reach agreements, allowing publishers to choose whether to keep articles behind a pay-per-view wall, or whether simply to enter into a royalty-sharing agreement based on ad revenues generated by views of their digitized content.
Diverging archival goals
Google hopes to digitize hundreds of millions of pages over the next few years, ProQuest VP of publishing Chris Cowan told LJ, a feat made possible less through any advance in scanning technology than through Google's capacity to work on a large scale, as well as an emphasis on quantity over quality. “[Google is] very creative at throwing technology at problems to build solutions,” Cowan said of the company's large-scale approach to scanning. Almost the entire process has been automated, from page imaging to optical character recognition (OCR) scanning.
The deal leaves significant room for ProQuest to differentiate its Historical Newspapers offering, which contains such publications as the New York Times and Chicago Tribune, as a premium product in terms of added editorial effort and the human intervention required to make its selectively scanned materials more discoverable and useful to expert researchers. In contrast to scanning by Google, editors hired by ProQuest check headlines, first paragraphs, captions, and more to achieve their claim of “99.95 percent accuracy.” In addition, metadata is added along with tags describing whether the scanned content is an article, opinion piece, editorial cartoon, etc. Finally, ProQuest stresses that the agreement does not affect long-term preservation plans for the microfilm collection. “Microfilm will always be the preservation medium,” Cowan said, noting that, while digital formats are constantly changing, “film that's handled appropriately can last several hundred years.”





















