Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Scan This Book!

In the race to digitize the public domain, is the future of the library at stake? An interview with the Open Content Alliance's Brewster Kahle

By Andrew Richard Albanese -- Library Journal, 8/15/2007

For all the potential of Web.2.0 technologies, our literary future still rests on what we make of our past, specifically, the centuries of ideas and human thought recorded in the miles of print books sitting on library shelves around the world. To merge the texts and traditions of our print past and our web future, explains visionary technologist Brewster Kahle, represents a truly historic moment for our culture. And, he adds, an opportunity that librarians must be prepared to seize.

Gold rush 2.0

Despite all the librarians who eagerly identify themselves as book lovers, it's hard not to notice that books have had, well, a rather rocky start to the Internet Age. In the first iteration of a World Wide Web, they remained all but hidden on library shelves, and, unsurprisingly, circulation numbers dipped. That led some to surmise that the book was languishing in the throes of obsolescence. But as search technology improved and books became more discoverable through online library catalogs and keyword searches on the wider web, circulation surged back, by double-digit margins in many libraries. Overnight, books that went untouched for years were getting into patrons' hands again. Almost any librarian today will tell you their book circulation is going strong. The question now, however, is where is it going?

Books certainly seem to be a hot commodity these days. Google is racing to tap library shelves worldwide like prospectors once raced to the Klondike. But with its gold rush, “scan first, ask questions later” approach, Google's library program, despite myriad potential benefits, has also wrought confusion, lawsuits from publishers and authors, and serious concerns about how our shared, public domain heritage could be parceled away by commercial gatekeepers in the coming digital generations.

The allied advance

Against this backdrop, in 2005 the forward-thinking Kahle launched the Open Content Alliance (OCA). An alternative scan plan to Google's controversial library project, Kahle's vision of putting books online embraces the values of openness central to librarianship and vital to the work of libraries.

“The OCA is universities, public libraries, and commercial companies all working together to build something in the tradition of the Internet,” he explains, “as opposed to a locked-down corporate enterprise, like some of the alternatives.”

OCA now counts 40 members and “regional scanning centers” in six cities scanning up to 12,000 books a month, over four million pages. For 10¢ a page, Kahle says, the OCA can now bring public domain books and other materials online, nondestructively, and offer them to the world. And unlike Google's plan, there are no restrictions on public domain books scanned by OCA members. Users are not forced to use proprietary interfaces; OCA scans are not hidden from rival search engines.

Of course, the nonprofit OCA's scanning pace, although increasing, lags behind its well-funded commercial competition, which has Kahle concerned. With commercial web services now performing so many tasks once performed by librarians, what would it mean if access to books also came to be dominated by commercial services?

“If we lose books,” Kahle says bluntly, “no matter what we do with born digital material or web pages, RSS feeds, blogs, whatever, it will all come to naught.”

LJ recently visited with Kahle in San Francisco to get his take on the challenges of getting books on the web, the progress of the Open Content Alliance, and the intertwined future of books and libraries.

LJ: We hear so often that, for many, if it isn't online, it doesn't exist. Is this part of what inspired you to turn to books?

BK: Books are the heart of the library. For thousands of years, humans have been putting their knowledge in books to pass on for future generations. So, yes, we have to have these materials in digital form, and we have to make them accessible in such a way that we can continue to have a library system like the one, frankly, that many of us grew up enjoying, where we can access and use these materials in new and different ways, as an engine for research, learning, and discovery, even if in ways not originally intended. So far, I think we have been negligent in our responsibility to perform this task. Not because we don't have the materials, but because we haven't put them into the formats new generations expect.

You've been critical of Google's library partnerships. What is Google doing right and/or wrong?

Two problems: one is perpetual restrictions on the public domain. Another is that these negotiations are all going on in secret. It shouldn't take a subpoena to get information from a librarian. But in this new world order, both perpetual restrictions and gag orders are being put in place on libraries by a corporate enterprise. The idea of making all books accessible online in new and different ways is all good news. But if you do this in a way that the materials that have been housed in libraries for centuries are made available only through one corporate interface, that is an Orwellian future.

Are you surprised to see libraries signing up with Google under restrictive terms?

I'm not surprised that a corporation wants to be the only place someone can get information, and I was not terribly surprised that some libraries went forward with this before they understood how they could do it on their own and how much it would cost to do it for themselves, not only to do the digitization but also to create services around these collections. I was surprised to see more libraries jumping on the Google bandwagon after demonstrating how libraries can do this and after actually doing it with the Open Content Alliance.

Ideally, how should Google Book Search, or any other web-based book service, work?

Tim O'Reilly, who is an adviser to Google Book Search, said it best. He said, simply, book search should work like web search. Look at it this way: Google says it has the right to scan people's books to create web services, yet it doesn't allow other people to scan its scans to create web services. I say let's have the Golden Rule apply: do unto others. Either ask permission before scanning, then you can demand permission from others, or, a better world would be, scan all for navigational purposes but allow other people to scan all for navigational purposes. I asked publishers and OCA members if they would be happy with book search working in this way, like web search, and they all said yes. I asked a Google representative and was told no. So the question is, why? What does Google have in mind?

If libraries had the organization and the will, could we scan our collections ourselves, without such restrictions?

Yes. We've achieved mass digitization at 10¢ a page, on average about $30 a book. That includes high-resolution color imaging, optical character recognition, and compression and packaging into PDFs. And all of it open, meaning you can download and use these books in bulk. Take a million-book library, which is larger than most libraries in the world. What would it cost to make a million-book library online? At 10¢ a page, 300 pages in a book, it would price out at about $30 million, costs that could be spread out over many institutions. If the library market in the United States is about $12 billion a year, $3 billion to $4 billion of which goes to publishers' products, $30 million is about one percent of one year's budget. We can do this.

Google and OCA would seem to be natural allies. Why hasn't that alliance happened?

With the OCA, we originally tried to figure out whether to put in some restrictions in such a way that Google would come onboard. We found that when we put some restrictions in, the commercial guys just wanted even more. The public domain is small enough as it stands, we thought, let's not clobber it again as it goes digital. Let's let people use the public domain for whatever.

Microsoft was involved with OCA but hasn't it since launched its own, more restrictive book project?

At the OCA launch, Microsoft committed to scanning a lot of books under OCA principles, but it changed after a year of scanning. It put in more restrictions that make it incompatible with OCA, such as it doesn't want its books surfaced in other commercial services. We're sad to see Microsoft putting more restrictions on its scans. But this is a reaction to the growing environment: if Google would take its restrictions off the public domain, I'm sure Microsoft would follow.

Google's pitch to libraries can be awfully attractive, and it is so ubiquitous. How does the OCA compete for library partners?

Revolutions aren't started by majorities. They come from leaders who see things that need to be done. Boston Public Library, for example, has been courted by Google, but it has said it is going to remain open. The Library of Congress also announced it is going to work with the Open Content Alliance. That's what it takes. It takes guts on the part of our leadership to keep librarians first-class members of this information world, not just in a service role of feeding the machine and then checking out at the end of the day because everything's going to be handled by some great search engine in the sky. No. It should be handled by us. We have the tools to build this open world right now. We can invest in ourselves, in the traditions that we come from. This is a choice.

E-journals are now the standard in libraries, and newspapers are all online. Why have books lagged behind?

In book publishing, I think we got the cart before the horse. I think they jumped the gun in the early ebook experiments and blew a bunch of money. They made these awful digital rights–managed interfaces, and nobody liked them. We've seen it time after time, where a group thinks it knows what people want, and it puts it out there and says, “Buy it!” But often it's better in new tech areas to try open systems first to debug the technology and to get a critical mass of users. That way, when you begin to sell into it, you know you can actually get to people. The web was built this way. It started with whatever was out there, and once there was critical mass, you could sell into it. Ebooks tried to go the other way. Now, it's inexpensive to make scans and put books online, even cheaper if you can chop up the book, like with current titles. So, what are we waiting for? Go out and try some things. Having your trade association sue people isn't going to put publishers in any better position than the record industry is in now.

How is the OCA's relationship with publishers?

We work very well with the publishing industry. We're focusing on the public domain right now, trying to find where the traditional line should be: how you have libraries in the future and publishing enterprises—that's enterprises, plural. Let's have lots of publishers, let's have authors who are actually compensated, as opposed to some of what we've seen as these conglomerates are created, these large-scale media companies, whether they are search engines becoming media companies or traditional media companies just becoming larger.

Dealing with in-copyright materials, that's where Creative Commons licenses are starting to make more sense, where noncommercial uses, research and educational use, for example, are allowed, but if you're going to make commercial reuse, you get permission. We would like to see many publishers, many distribution methods, many different user communities all thriving in this new digital era in the same way they did in the book publishing industry for hundreds of years, until recently.

You mention digital rights–managed (DRM) interfaces. How much of a stumbling block is DRM for books online?

DRM used to be called copy protection, and it didn't work for the software industry, it's not working for music, and it won't work for books. It is a bad idea that contributes to the demise of an industry. In the software industry, it was a complete failure. The company that became the richest company in the world, Microsoft, didn't copy-protect. You could endlessly copy its software, and you were just supposed to pay. In some ways, Microsoft became the richest company in the world by being a shareware company.

Piracy by end users is a distraction. It isn't what caused the software industry to collapse. What happened was someone got in and controlled the distribution, and we're seeing this happening again in music. For the dream of copy protection, companies are signing up with a single company to handle the distribution and pricing of their work: Apple. And if someone else controls your pricing and distribution, you're not a company, you're a division. I hope the book industry doesn't feel it needs to have centralized copy protection schemes. It's a trap.

How challenging are copyright issues, such as orphan works, in getting our literary past online and accessible?

It's bad out there. If we had the copyright of Richard Nixon, the Internet would be a much more interesting, thriving place. We're showing the results of decades of successful lobbyists with very narrow interests hijacking the information age.

The public domain is small enough, but then we have all these works that are out of print: orphan works. Orphan works are noncommercially viable, and what we've found is that with these works, you can't get someone to call you back. There's no lawyer willing to negotiate on the other end, because there's not enough money on the table. The truth is most books are pretty much dead within a year or two of being published. Libraries are the only places you can get a lot of these books. So we asked: How can libraries get these works on our digital bookshelves? Well, the way you ask a question like that in the United States is you sue the government. Lawrence Lessig brought that suit [Kahle v. Gonzales] on our behalf, and it's now been rejected on the district and appellate level. But it also seems to have spawned the copyright office's orphan works hearings, so there is the possibility of legislation.

As a “digital librarian” and an Internet pioneer, how do you view the library system?

I see the library system in this country as a $12 billion industry dedicated to preservation and access of materials that are not mediated through a corporate experience. You don't have to sign a nondisclosure form to come up with a new idea in a library. In libraries, materials are preserved in original form, uncensored. The alternative is that the materials people learn from are forever mediated by a relatively small number of commercial companies in terms of selection and presentation. This is one of the biggest issues facing libraries in the future: what services will they perform, and what services will be performed by companies or by nonprofits acting like companies. If all content is moderated by a few companies in the digital world, we'll have a giant bookstore rather than a library system.

Do you think librarians are prepared to face that challenge?

If we stick to our original principles of preservation and access, I think we're in good shape. If we think the real challenge is applying the cataloging or selection criteria training we had in library school, well, those things are fundamentally changing. The Internet has made it so people are searchers all the time. I think it can be the librarians' day if we more boldly step into the world of digital resources. In large part, the librarian community hasn't done this yet. In some ways, yes, by putting Internet terminals in or negotiating contracts with Elsevier for commercial services. But let's do something more interesting. Let's build services in the digital world analogous to the services we perform in the analog world.

How do librarians who want to go digital and open with their collections and services get started?

It starts with a passion, it starts with a focus. Take a content set or a user need that you see and start producing services on your own. If you're in a public library, it might be town history. If you're a university librarian, it might be a subject specialty. Get those materials online in a way that you have control of them. Be a little bold, maybe work with others on open source projects to leverage some of the existing open OPAC systems.

At an OCA regional scanning center, we'll scan your materials for 10¢ a page. Audio recordings we can do for about $10 a disc, and videos about $15 per hour. And we'll do all of the hosting for free; you can do the interfaces. It doesn't require grants. It does require some creativity, a lot of slack to try it out, and the realization that there is a lot at stake.

We have to recognize that it's not only possible but it is our responsibility to bring digital services to the world. If we can build this next generation in the open, the same way the open network and the open software infrastructure of the Internet developed, it will be the librarians' day. Media companies, the Googles and Microsofts, they will play their roles. They'll bring things to hundreds of millions. But they will never bring things to our patrons the way we can as librarians. Let publishers and the new generations of media companies do their job. We've always worked in parallel with them. But let's not lose the library in the transaction.

 

Interested in OCA?

CALL 415-561-6767

EAX 415-840-0391

WRITE Open Content Alliance c/o Internet Archive The Presidio of San Francisco 116 Sheridan Avenue San Francisco, CA 94129-0244

EMAIL OCA at archive.org

OLINE www.opencontentalliance.org


Author Information
Andrew Richard Albanese is Editor, LJ Academic Newswire

Related Content

Related Content

 

By This Author

Sponsored Links




 
Advertisement
Sponsored Links

MOST POPULAR PAGES

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS


Booksmack
LJXpress
LJ Academic Newswire
LJReview Alert
LJ Criticas Review Alert
SLJ Extra Helping
Curriculum Connections
SLJTeen
PWDaily
Children's Bookshelf
PW Comics Week
Cooking the Books
Religion BookLine
Please read our Privacy Policy
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites