Google To Digitize 15 Million Books
Librarians mostly hail project but raise questions about effects; also, Google Scholar launches, offers links behind paywalls
By Andrew Albanese -- Library Journal, 1/15/2005
Librarians mostly lauded an ambitious Google program to digitize as many as 15 million books from five prestigious libraries. Still, they wonder how the plan, announced last month, will affect libraries and publishers. Dennis Dillon, associate university librarian at the University of Texas at Austin (UT-Austin), called the program "brilliant." "A great leap forward," commented Michael Keller, head librarian at Stanford University—one of Google's partners in the project—noting that online book content has lagged far behind journal content.
Stanford and University of Michigan intend to offer their entire book collections for Google to scan, but Harvard and Oxford universities are offering just a portion. The New York Public Library is contributing only rare materials. James Neal, Columbia University librarian, said, "It takes this bubbling pot of digitization water quickly to a boiling point."
The program is an expansion of the Google Print program, which offers digital excerpts of books currently in copyright. Those searching with Google will see links to relevant books in their results page. The ensuing Google Print page will allow full-text browsing of public domain works and brief excerpts and/or bibliographic data of copyrighted material—plus links to libraries and booksellers offering the volume. The announcement came on the heels of the Google Scholar program that will enhance access to journals.
Questions raisedSome raised questions. "I believe, however, that massive databases of digitized whole books, especially scholarly books, are expensive exercises in futility based on the staggering notion that, for the first time in history, one form of communication (electronic) will supplant and obliterate all previous forms," wrote Michael Gorman, dean of library services at California State University, Fresno (and American Library Association president-elect), in the Los Angeles Times.
Karen Coyle, a digital library consultant formerly with the California Digital Library, noted that "we've basically got stuff in the public domain and scientific preprints, but everything between 1926 and 1994 is mostly unavailable." While acknowledging the positive aspects of Google's program, she said that it "scares me…because no one has looked at what we are doing to the growth of knowledge."
She points to the University of California (UC) experience when librarians first opened access to the Medline database, and use surged. "At the same time we obviously skewed scholarship," said Coyle.
Libraries and publishersUT-Austin's Dillon acknowledged, "It's back to the drawing board for libraries.... We have $8 million we were going to throw into a new OPAC. We were already wondering why we should do this.
To remain relevant, libraries will need to focus on added value: "We're looking at localization, customization, and providing services. Google owns the mass market, so we have to play around the niches," noted Dillon.
The program also marks an aggressive new strategy for Google. With $1.66 billion in fresh IPO funding, the company is clearly not content to wait for book publishers, which have been slow to offer book content online for fear of hurting print sales.
Oxford University Press academic publisher Niko Pfund said he was cautiously optimistic: "Bottom line is that we're still creating and providing content—that's what we do—vs. how that content is delivered, which is what we're talking about here. I can see that Google Print has great potential for generating additional print sales."
Internet Archive projectGoogle isn't the only game in town. Last month, the San Francisco–based not-for-profit Internet Archive (archive. org) announced partnerships with a number of international libraries as part of an ongoing effort launched in 2003 to scan books into "open access archives." Included are the Library of Congress, University of Toronto, and Carnegie Mellon University, as well as the Bibliotheca Alexandria in Egypt, Zhejiang University in China, and Netherlands-based European Archive.
Supported by a range of public and private grants, the Internet Archive has pioneered digital archiving efforts for all formats, including audio and moving images. Currently, over one million public domain or "appropriately licensed" books have been committed to the archive, and over 27,000 are already available.
Google Scholar debutsThough the library project gained headlines, the Google Scholar program also may have a huge impact. The project will offer access to the metadata of scholarly publications, including documents behind subscription paywalls not previously spidered by the company. The content will be "scholarly literature, including peer-reviewed papers, theses, books, preprints, abstracts, and technical reports from all broad areas of research."
The service takes advantage of OCLC's Open WorldCat, which puts library catalogs in front of Google spiders, as well as the linking service CrossRef.
According to Google, users executing a Google Scholar search would receive a citation for a returned article but would need to access the full text either through their library, individual subscriptions, or any other relationship dictated by the publisher, including pay per view. Google says it will not earn money from new subscriptions or fees generated between searchers and publishers. While there is advertising potential in the model, a Google release stated that the company started the program to give back to the scholarly community.
How should librarians react? Gary Price, a librarian, writer, and editor who edits the popular ResourceShelf web log, said librarians must get users to use library resources as instinctively as they now use Google. "It's worth watching to see if people begin paying for material located via Google Scholar that they can get free from a specialty database ...via their public or academic library," Price and Shirl Kennedy wrote on ResourceShelf.
























