Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Chief Thingamabrarian

LJ talks to mastermind of the LibraryThing web site, bookhound Tim Spalding

By Melissa L. Rethlefsen -- Library Journal, 1/15/2007

Tim Spalding’s latest and most successful project, LibraryThing, contains records for over eight million books, making it larger than most libraries. LibraryThing offers avid collectors and casual readers alike a way to keep track of personal book collections easily and find and connect with others whose libraries are similar, as well as get recommendations for books to read based on a personal library or a single book. One of LibraryThing’s latest tools is the UnSuggester, a clever and highly entertaining peek at what books not to read (e.g., if you like Neal Stephenson’s Cryptonomicon, you might want to veer away from Leigh Radford’s One-Skein Knitting).

Relying on data from the Library of Congress (LC), Amazon, and over 30 other library catalogs worldwide for catalog records and book jacket graphics, LibraryThing also depends on user-generated content such as book reviews, tags, ratings, and changes to catalog records to create a dynamic, social space for book lovers.

Tags, one of the hallmarks of today’s social software, play a particularly large role in LibraryThing’s makeup. Tagging and folksonomy, the emergent, organic language that develops from multiuser tagging, face criticism from the library and taxonomy communities for the lack of standardization, authority control, and even correct grammar. LibraryThing, which also uses LC subject headings (LCSH) to enhance records, shows that folksonomy and taxonomy can exist side by side to the benefit of its users. Author Rethlefsen recently caught up with Spalding to talk about tagging, LibraryThing, and library catalogs.

LJ: Where did the idea of LibraryThing come from?

TS: I came up with the idea some time ago and never acted on it. When I finally got around to it, I was under the impression that it would just be for a few academics to catalog their books. I didn’t wake up to the social potential until it took off for cataloging and I started to see people sending their library URLs to each other and playing with recommendations. I began it in August 2005, and it took about a month of coding.

How is it funded?

LibraryThing was self-funded. It was profitable from the start as a personal thing. But it wasn’t profitable enough to expand as I wanted, and I was worried that a bad month would send me back to freelancing. Start-ups are generally “financed on credit cards.” LibraryThing was unique in that we were already deeply in debt when I started it. I had no net, and my wife, Lisa Carey, was pregnant. Good motivator, that (see LJ Talks to Lisa Carey, 9/12/06). In May 2006, LibraryThing was funded by AbeBooks. It bought 40 percent of the company (I retain 60 percent) and put some money into it—enough to scale up on employees (we’ve fluctuated between three and five) and buy a bunch of servers. LibraryThing makes its money from memberships—not to be underestimated—and a pittance from when people buy a book from Amazon. In the near future we will be selling recommendations and other data.

Most librarians don’t really know how the statistics behind tagging, tag clouds, etc., actually work—at least I don’t.

Tag clouds are just a representation of the data. At the simplest, I could just say one tag means a one-pixel-high word, two tags is a two-pixel-high word, etc. But, obviously, this doesn’t work with real data. So you do various things to it, like making no tag smaller than eight pixels (the readability limit) and preventing high-count tags from being ten feet tall. The goal is a display that conveys some of the peaks and troughs yet takes up about the area assigned to it and is legible. The algorithm I use took me forever to write. Smoothing the data this way and that results in some distortion, but most LibraryThing tag clouds have a “show numbers” link next to them, so you can see the raw statistics involved. Some day, when I get all the clouds using the same, new algorithm, I’ll give them all that link.

Most of the tag clouds include a second factor, “salience,” represented in bold [see graphic, p. 41]. Take the tag cloud for Anne Frank’s diary (www.librarything.com/work/5116). You can see that “nonfiction” is large but not bold. “Nazis” is small but bold. That’s salience. The point is to give the tags some context. “Nonfiction” is a common tag for the book, but it’s not that common in the grander scheme of things—there are a lot of books tagged “non-fiction” and The Diary of Anne Frank is not a high percentage of the total. By contrast, Nazi may be small, but it’s very “salient.” The Diary of Anne Frank takes a reasonably large chunk of all its uses. As for tagging statistics, LibraryThing doesn’t toy with them except:

For statistical purposes, LibraryThing never cares about upper- and lowercase. User data is never changed, but on a global level WWII and wwii are the same tag.

There is a tag-combining feature. Basically, users can decide that a particular tag is the same as another one. So, for example, wwii is the same as ww2 (see www.librarything.com/tag/wwii for examples). To my knowledge, LibraryThing is the only service that does this.

Does LibraryThing “censor” any tags for tag clouds, etc., whether they are obviously personal tags like “toread” or are otherwise objectionable?

We never censor tags. But tags do drop out statistically. Hit “show numbers” on the Anne Frank book and you’ll see the lowest count is six. A moment’s consideration will tell you that’s not likely. In fact, tag distribution is a classic “long tail.” A few tags get a lot of weight, and many get little. To illustrate the point, there are 476 different tags for Anne Frank’s diary.

In this context, LibraryThing shows the top 32 tags (in fact, I think it’s retrieving the top 35, but some of them have been combined). The effect of this is to produce a set of displayed tags that are pretty good. There are a few personal ones—folio society, own, paperback, read—but most are “subject tags,” and they’re accurate and would enhance findability in the library catalog. By contrast, the LCSH tags for the first English edition are skimpy and miss whole dimensions of the thing.

World War, 1939–1945—Jews.

Netherlands—History—German occupation, 1940–1945.

What does this miss? Well, for starters, The Diary of Anne Frank is the most famous work by a young person and one of the most famous autobiographies, too. Mentioning the Holocaust would also be nice, but since the term hadn’t received wide currency by then, it didn’t make it in and never will. (Later editions include it.)

The most important effect of choosing the top tags is to screen out the really “bad” tags. “Folio society” may be personal, but at least it indicates someone has the folio society version. (Under some conditions that might even be interesting; I didn’t know there was an FS edition, so I might be moved to get it over the crappy paperback I have.) But we can’t do much with “girls’ room” (one, personal), “historyish” (one, dumb-ish) or “@gamma” (one, a personal shelving system?).

This is not unlike any other “democratic” system. If we allowed only one American to vote, there would be a small possibility that we’d wake up to discover America had elected a Libertarian or a Maoist for president . As you expand the inputs, the rare stuff drops out.

In addition to choosing the “top” tags for a book, a library system might decide to set a lower threshold. A less popular book on the Holocaust might indeed draw a “so-called holocaust” tag. But it isn’t likely to draw that from many users. Another, better way to do it is to look at the tag’s use overall. If a tag is used by one or two people, it’s not likely to be of general interest. In fact, both these calculations and some others are used when LibraryThing calculates the similarly tagged books’ algorithm. I don’t want the noise having much of a contribution to the result.

Just for fun, what are your favorite tagging applications other than LibraryThing? Do you use social bookmarking?

I love Flickr, but my wife won’t let me use it for our pictures—it isn’t integrated with places that print out pictures for our relatives [printing pictures has since been added.—Ed.]. I have used Del.icio.us but stopped. I keep all my bookmarks public, so they would reveal way too much about me. I’d be tagging blog posts about LibraryThing [with the annotation] “that jerk you met at NELINET” or bookmarking pages on topics that would give competitors a clear idea of where we’re going. I just started using a product to tag files on my computer. That makes a lot of sense to me, but since it isn’t integrated with the [operating system], I have an extra step.

Have you had much interest in your request to work with libraries with Innovative Interfaces Millennium OPACs?

As I recall, four or five libraries were interested, but they were mostly academic. I didn’t get as much public library interest, and no good medium-sized public library volunteered. Although LibraryThing users have all sorts of books, the overlap is best with a good public library. Unfortunately, public libraries are also more scared of user-contributed data than academic ones. Tricky.

I looked for Innovative OPACs because although they’re pretty “locked down,” they’re also easy for me to extract ISBNs (and LCCNs) from them. The library-side thing we need to implement most catalog enhancement is an accurate list of holdings, so that, for example, LibraryThing doesn’t recommend something the library doesn’t have.

Have libraries made any creative uses of LibraryThing? I know about some libraries that are using LibraryThing as their catalog, or are using the widget to roll new books on their web sites.

That’s about it. Some “very” small libraries are using it for their catalog. These are mostly [specialized] libraries—the Museum of Comic Art in New York, the Nabokov house museum in St. Petersburg, and a welter of church libraries. But LibraryThing is not currently aiming to be a low-end OPAC. For starters, we’d need to do DVDs and CDs better.

What does LibraryThing have in the pipeline? [See netConnect, Product Pipeline, p. 14]

LibraryThing does have two products [in development]. Until now, we’ve been doing our own thing. A few libraries have used LibraryThing for this and that, for example, to create new-book feeds, but it’s been peripheral to what we do. (The book-feeds idea is great. We’re glad to help. But LibraryThing is only in this space because most OPACs won’t let you extract a list of recently added books. This is crazy.)

In the next few months we’re going to be releasing two things. “Library widgets” are simple code libraries can add to their web pages to give their catalog book recommendations, ratings, reviews, tags, and item-level other-edition FRBRish links [see “What Is FRBR?netConnect, LJ 4/15/05]. Integrating with OPACs is hard, but we think we can do a lot with small, easy-to-use bits of JavaScript—no back-end integration necessary. We’re going to be the “un-vendor” here—no sales calls, tchotchkes, long-term contracts, and so forth. It will also be very cheap (and free for the smallest libraries). Since libraries currently pay through the nose for the data they get, we hope this will be somewhat disruptive.

“The LibraryThing Tag Consortium” (name suggestions wanted!). We feel that tags don’t make much sense library by library or in small numbers. (The only library tagging program we know about, PennTags, has around 30,000, many of them on web pages, not books.) You need a lot before most of the benefits happen—relevancy ranking, suggestion algorithms, meaningful tag clouds, etc. And everything libraries worry about—idiots tagging, offensive tags, etc.—are much worse when you have small numbers; with large numbers, the problems “wash out” statistically. For example, you only show a tag if it’s used by many people.

Anyway, we’re going to be offering a combination of application programming interfaces (APIs) and widgets for libraries to add tagging to their site and get access to everyone else’s tags, including LibraryThing’s nearly ten million. There will be no data-license restrictions to keep people “locked in,” and much of the data will be available without subscription.

The widgets will get some traction. We’re basically offering Roy Tennant’s “lipstick on a pig” [see LJ 4/15/05, p. 34], but the lipstick will be nice and quite cheap. I’m not certain about the tag consortium. It will require more effort on the technical end, and tags themselves are a hard sell to many libraries. I do hope someone cobbles together a decent user-data consortium, however, and LibraryThing might as well take a shot at it. It’s forced us to reengineer our back end a lot. We’re going to be the consortium’s “first customer” and expect to change over to the system behind the scenes within the next few weeks.



Link List
 

Tag, You’re It

Another interesting tagging project from the art world is Steve: The Art Museum Social Tagging Project (www.steve.museum), billed as “the first experiment in social tagging of museum collections,” which has recently been funded by the Institute of Museum and Library Services (IMLS) for two years.

At a New York Technical Services Librarians meeting November 17, Susan Chun of the Metropolitan Museum of Art said steve solves the problem of “additional access points, multilingual information, and things that aren’t often included in art catalog records, like color.”

Though the audience was somewhat skeptical, Chun said, “steve won’t replace anything, and tags must exist alongside traditional cataloging.” Though some tags may have nebulous value, MoMA found that 92 percent of tags added new information that wasn’t present in traditional sources.

Active since 2005, the tag collection is being studied by social scientists at Princeton University and University of Michigan. Questions being asked include, “What produces good tags?” The schools will analyze types and clusters of tags by deduping and stemming.—Jay Datema


Author Information
Melissa L. Rethlefsen is Education Technology Librarian, Mayo Clinic College of Medicine’s Learning Resource Center, Rochester, MN

Related Content

Related Content

 

By This Author

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
LJ Criticas Review Alert
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites