ProQuest and Google Strike Newspaper Digitization Deal
Josh Hadro -- Library Journal, 9/12/2008
Hoping to do for newspapers what Google Book Search has done for monographs, ProQuest and search giant Google have reached an agreement to digitize millions of pages of content from ProQuest’s vast newspaper microfilm archives. While ProQuest has vowed to continue improving and expanding its Historical Newspapers collection independently, the Google deal aims to create searchable electronic versions of smaller newspapers otherwise unlikely to be digitized, making them available on the open web via Google’s News archive search. “The problem is that, until now, finding a workable economic model for libraries and publishers has been challenging,” said Rod Gauvin, ProQuest senior VP of publishing. “This model overcomes that hurdle, unlocking a wealth of content for libraries and Internet users with unique research needs.”Google is underwriting digitization costs—which have not been detailed—in return for revenue based on ads displayed alongside the newspaper page images (see an example scanned from the St. Petersburg Times). Digitization has begun with the content to which ProQuest already has rights to digitize and make available online, including mostly orphaned publications and those in the public domain. For newspapers in the ProQuest archives still bound by copyright, Google and ProQuest execs say they hope to work with copyright owners to reach further agreements, allowing publishers to choose whether to keep articles behind a pay-per-view wall, or whether simply to enter into a royalty-sharing agreement based on ad revenues generated by views of their digitized content.
Scanning toward different ends
Google hopes to digitize hundreds of millions of pages over the next few years, ProQuest VP of publishing Chris Cowan told LJ, a feat made possible less through any advance in scanning technology than through Google’s capacity to work on a large scale, as well as an emphasis on quantity over quality. “[Google is] very creative at throwing technology at problems to build solutions,” Cowan said of the company’s large-scale approach to scanning, saying that nearly the entire process has been automated, from page imaging to optical character recognition (OCR) scanning.
The deal leaves significant room for ProQuest to differentiate its Historical Newspapers offering, which contain such publications as the New York Times and Chicago Tribune, as a premium product in terms of added editorial effort and the human intervention required to make its selectively scanned materials more discoverable and useful to expert researchers. In contrast to scanning by Google, editors hired by ProQuest check headlines, first paragraphs, captions, and more to achieve their claim of “99.95 percent accuracy.” In addition, metadata is added along with tags describing whether the scanned content is an article, opinion piece, editorial cartoon, etc. Finally, ProQuest stresses that the agreement does not affect long-term preservation plans for the microfilm collection. “Microfilm will always be the preservation medium,” Cowan said, noting that, while digital formats are constantly changing, “film that’s handled appropriately can last several hundred years.”























