A Paperback in Four Minutes
Peter Brantley says the time is now to consider print on demand, which could become as ubiquitous as copy machines in libraries
By Peter Brantley -- netConnect, 4/15/2007
Libraries now have new ways of getting rare, fragile, or scarce materials into the hands of readers and researchers, thanks to the growing corpus of digitized texts combined with growing print on demand (POD) options. For example, Amazon's Booksurge POD subsidiary has entered into agreements with both Cornell University and University of Michigan (UM) libraries to make digitized, historical books from their collections available for sale through Amazon.
The originals of these works don't circulate owing to their condition or may only circulate within the university's community. POD allows them to be accessed by a far larger population at a reasonable cost.
At a time when most of the attention in libraries is either focused on maintaining existing operations or the lure of the opportunities presented by digital discovery and delivery of material in massive digitized book repositories such as Google Book Search (GBS) and the Open Content Alliance (OCA), another sea change is rippling toward our shelves.
One of the most exciting things made possible by these repositories, beyond the ability of users to discover books to which they might not otherwise be exposed, is the ability to ensure new forms of access. Print on demand, the on-demand printing and binding of a book traditionally initiated by a point-of-sale request, has rapidly permeated the commercial book marketplace and is on the threshold of providing libraries with a plethora of exciting new opportunities.
Amazon/Ingram as publishers
At UM, over 10,000 paperbacks are sold through Amazon, and printed by Booksurge, with an average price of $20–$26. Over 8000 books are available in hardback through Ingram's Lightning Source POD subsidiary; volumes are sold for a set price of $39.95. According to Maria Bonn, the director of UM's Scholarly Publishing Office, “We consider all of our POD programs a success; we are providing a service that readers are really pleased with, at a reasonable price (we have lots of testimonial email), and we are not losing money.”
Bonn adds that the costs of digitization are not included in their accounting because the library is already digitizing the materials; this externalization of digitization costs is characteristic of most library POD programs. Preparing titles for POD is not necessarily straightforward either; there is a limited range of page sizes acceptable by most POD programs, and older materials have more variance in their physical layout. Even getting basic metadata for retailers can be a challenge. “[T]he metadata we have available from MARC records does not conform to the standard publisher requirements,” notes Bonn.
Despite these limitations, executives like Lightning Source CEO J. Kirby Best reveal a keen awareness of the value of the long-tail material presented by rare, fragile, or old library collections. Lightning Source will consider funding proposals for digitization projects at university libraries that focus on obtaining high enough quality scans and images to power print on demand (as well as other digital delivery) solutions.
Like the OCA, Lightning Source is willing to contractually commit that partner libraries are able to retain the resulting high-quality images. Given Amazon's strong position in the bookselling market and its existing digitization programs (such as Look Inside™), combined with the presence in its corporate portfolio of BookSurge, Amazon also might be willing to support library digitization efforts in a manner distinct in arrangement and benefit from the initiatives of Google or Microsoft.
Notably, Amazon has a history of working with libraries to facilitate their acquisition of titles through Amazon services: library customers have the option of including MARC records, labels, and barcodes when they place orders.
Printing: here or there?
Ultimately, one of the most interesting questions for libraries exploring POD is how their growing repositories, homegrown and publisher supplied, are married to the actual printing services that enable the “print” in print on demand—in other words, how are these services managed and by whom?
Historically, POD has been a centralized service because the repository of material is wedded to the actual production and fulfillment services. On-demand orders come into retailers such as Amazon or UM, are routed to the appropriate content builder, such as BookSurge, and are then delivered to the requester's chosen address or to pickup points such as neighborhood copy centers. But in a world where network costs are low and dropping, and where the service of printing and delivery can be unbundled from the storage of the digital files, that process can be streamlined for the user.
With echoes of the historical shift that arose with the introduction of the first Xerox machine, Jason Epstein, a longtime Random House editorial director and cofounder of the New York Review of Books, and Dane Neller have started a new company called On Demand Books. It promises to bring POD services directly to users, within libraries, cafés, bookstores, and potentially even the home.
On Demand is marketing a machine called the Espresso Book Machine, one of which will be placed at the New York Public Library; another will be provided to the New Orleans Public Library with a grant from the Sloan Foundation to assist in the replacement of Hurricane Katrina–damaged inventory at the library's main branch. The machine will also print public domain titles for New Orleans public schools to help restock books on their required reading lists.
The Espresso Book Machine can print four-color-cover paperbacks of up to about 500 pages. It receives files through an Internet connection and requires relatively minimal human intervention. A two-printer version (with both black-and-white and color printers) can produce an average-sized 300-page book in approximately four minutes. The Espresso is fairly large—about 8'l x 5'd x 5'h—and it is heavy—about 1600 pounds. It is not the kind of thing you plop into an unused study carrel. It takes a room to run an Espresso.
Better service
Espresso is attractive for libraries since it gets books and other content into the hands of readers in a format they may prefer. Another benefit is slightly more subtle: libraries tend to be well funded for acquisitions but not always well funded for anything but essential operations. With shelving space a priority, and the option of digitizing older public domain materials or government documents an increasingly viable alternative to high-care handling in open stacks or special collections, POD could help librarians maintain high service efforts within harsh operating constraints.
There are also tremendously underserved markets in the commercial arena that have long been fundamental to the mission of all libraries, such as serving a diversity of language needs. Visiting a chain bookstore in any U.S. area with a significant Hispanic or Asian population will fail to unearth significant quantities of materials in Spanish, Portuguese, or Asian languages. Yet most libraries have served foreign-language-speaking and -reading populations for long periods of time and have acquired significant numbers of foreign-language materials. To the extent that these could be made available for POD services, they escape the bonds of the fate of any rare library material, to be weakly circulated or perhaps even unknown.
Part of what has made this consideration possible is the technical changes in scanning and OCR (optical character recognition) that have advanced hand-in-hand with other forms of digital text handling. With new solutions coming from companies such as Kirtas Technologies, high-speed, reliable digitization is increasingly accessible to libraries that do not have the highly skilled staffing or deep pockets historically required for these functions.
With no discredit to the brilliance of Google's engineering in hardware or workflow, the Google–library collaborations would not have been possible if digitization had not become so readily attainable. With growing pressures to digitize more and more material, this previously trade-like craft is becoming industrialized.
What does it take to POD?
Reasonably high-quality scans are required to make POD function correctly. Digital library operations have long worked to ensure that the scanning of fragile materials yields images of high enough resolution to provide preservation-capable files as well as smaller images of adequate quality for POD delivery. Joint efforts such as the OCA have also been geared to the generation of high-quality images that are widely accessible for reuse, although there have been recent concerns about the growth of content under OCA and its ability to establish broad title availability under the market pressures of more commercial efforts.
In contrast, the GBS digitization program involving libraries is by all accounts making tremendous headway. Google is scanning thousands of books daily; a high proportion of these are public domain works that are already being made available by Google for PDF download. This program benefits participating libraries directly, since they receive a digital copy of the book as part of their agreement; the generation of this second digital copy, and its release back to the library, is notably one of the issues that has most raised the attention and ire of the publishing community.
However, the relatively low quality and lack of uniform image condition returned by GBS have caused consternation among those seeking to use the GBS repository as a potential source for POD. Libraries that have begun making this content available primarily for viewing—such as UM's MBooks service—must carefully set expectations; uncertain fidelity has ramifications not only for near-term scholarship but also for the level of effort expended to preserve the materials as valuable cultural artifacts. The problems of ineligible scans, missing pages, and image artifacts remain prevalent enough to discourage anything but low-end reproduction.
As John Mark Ockerbloom of the University of Pennsylvania says, “It's beyond frustrating if you're actually taking the time and expense to print and bind a book on demand, only to find the book that's come off the scan has a show-stopper page error midway through, which is only realized at 10:30 p.m. by the reader some days after they've taken it home.”
These issues will also confront Microsoft's Live Book Search offering; any high-speed digitization process is not going to produce Linotype-quality prints. Mass digitization efforts are not designed to facilitate POD and have not focused on achieving optimal image quality; their focus is on indexing the texts, enhancing their discoverability, and presenting good enough quality images to users. Despite all of those caveats, POD remains an enticing means of delivering content to users in an attractive format, as long as expectations are appropriately matched with service.
“There are basically two kinds of approaches you can take to this issue,” observes Ockerbloom. “One, look for uses that don't require high quality to be useful. For example, if you can provide full-text search, or even just a way of looking at the pages of a book that a researcher could obtain but doesn't have at hand, that can be a big help even if it's imperfect; or second, look for ways, or set up ways, to determine whether a given book in the mass digitization sphere is of sufficient high quality for things like print on demand or to serve as a surrogate for books you're shipping to storage or de-accessioning, or other uses where quality matters more.”
Notably, for better or worse, most of the works that meet these higher-quality standards to date are not in the new, large mass digitized book repositories; instead, they are those that libraries scan for themselves and for the direct benefit of their community and place into their own digital libraries. The availability in the last few years of affordable book scanning solutions, ranging from the Plustek OpticBook to the systems developed by Kirtas Technologies, place relatively rapid, high-quality scanning into the hands of even modestly resourced libraries.
The twilight zone
Academic repositories, either from their own digitization programs or through partnerships with major commercial information discovery services, are overwhelmingly centered on older materials (i.e., “dead author books” as they are inelegantly known in the business) owing to rights restrictions. With works from 1923 serving as the dividing line in the United States—and restrictions in Europe and other parts of the world potentially more stringent—even the largest repositories do not tap into the vast majority of published works, either out-of- or in-print.
This is not to suggest, however, that sizable repositories of in-copyright books do not exist, for they do—and the ability of libraries potentially to take advantage of these repositories is both untested and untapped. Models for the interlibrary loan (ILL) of digital books are rudimentary at best, but they could probably be implemented without much development. If OCLC's WorldCat program becomes a de facto registry for digitized books, ILL would be an obvious derivative service offering.
Existing POD services such as Ingram's Lightning Source and Amazon's BookSurge both have large repositories of material that already exist in digital form and are primed for printing—after all, these firms must house digital files to engage in their fundamental service offerings of making print copies of books from digital files. Many publishers, including HarperCollins and Random House, are also actively engaged in their own efforts to digitize their backlist titles and to ensure a stable digital production workflow for their frontlist works. These repositories might well serve as the fountainhead for POD services if they can be married to appropriate printing services.
Bill McCoy, the director of publishing products for Adobe, says publishers will soon be operating within a mindset of digital content assets and their subsequent distribution through multiple channels, not from digitization from print copies. As that trend grows, it is likely that publishers will seek to license digital holdings to libraries for uses ranging from POD to digital lending, in the same way that serials publishers and aggregators have moved to such licensing models.
A shift toward digital licensing for books will force several issues of paramount concern to libraries, many of which re-create earlier debates about journals. Libraries will have to choose whether they require a printed copy of a licensed book to enter their collection as a circulating or archival copy. In addition to operational and administrative costs, digital licensing raises fundamental issues about the rights of access: because printing is not free, any print of digital content pushes libraries to consider significant budgetary impacts.
Further, at the same time that libraries may seek to retain at least one paper copy to preserve lending access as a core mission, it is uncertain whether libraries could conceivably sustain these efforts as the digital flood of content becomes greater. This may in turn generate “secondary curation” as libraries are forced to choose which of their digital material is worth the cost of printing; notably, if this material is licensed instead of purchased, contract terms and conditions may even limit print rights such as retention, should the licensing not be maintained.
The future
POD is an exciting nexus of new capabilities, presenting attractive opportunities. But, ultimately, another generation hence, I suspect that it may well occupy the same kind of niche that Xerox machines occupy—an essential component of information delivery that no office/library/school can do without.
| Link List |
|
| Author Information |
| Peter Brantley is Executive Director, Digital Library Federation, Berkeley, CA |























