Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine
Email
Learn RSS

Tennant: Digital Libraries   



Link This | Email this | Blog This | Comments (2)


BagIt -- Just BagIt

July 6, 2009 Perhaps I can be forgiven for seeing Michael Jackson in the digital preservation efforts of libraries, but when I ran across the "BagIt" initiative by the Library of Congress, the California Digital Library and Stanford, I couldn't help thinking about Michael Jackson's song "Beat It". So sue me.

But it still might be better than the allusion the project itself uses, which hangs off the phrase "bag it and tag it." For the appropriate video for that tagline, I must leave it to your imagination, but as I recall there was something from the "Thriller" album that would work quite well. But I digress.

BagIt is an intriguingly simple specification that aims to do one thing and do it well -- identify a set of files and transfer them reliably. There are some wrinkles, but overall it is a very straightforward way to accomplish a simple and yet all-important goal -- to transfer a set of files as a related unit. Some additional description from the Library of Congress web site describes this in greater detail:

A bag is like a folder or directory on a computer. It is essentially comprised of three elements: A bag declaration text file, which is like a seal of authenticity; a text-file manifest listing the files in the collection; and a subdirectory – usually titled “data” – filled with the digital content. The manifest is machine readable for automated data ingest. The receiving computer analyzes the manifest and runs checksums on the contents; if the checksums match, the transfer is successful.

A bag can also contain an optional text file, titled "bag-info.txt," that contains a small amount of administrative metadata, such as contact information for the collection owner and a brief description of the collection. Users can include much more metadata about the collection, but the Library recommends storing it in the "data" directory with the rest of the collection in order to keep the bag root directory uncluttered. Users can note in the "bag-info.txt" file that additional metadata exists and resides in the "data" directory.

This is all well and good, but I have to tell you that the "holey bag" just really makes my day. I mean, I couldn't come up with something this brilliant, and yet simple, on my best day:

A bag filled with content is considered complete. A variation, called a holey bag, is gaining wider acceptance because of its flexibility. A holey bag has the standard bag structure but its "data" directory is empty. The holey bag contains an additional text file titled "fetch.txt" at the root level that lists the URLs of the files to be fetched (so-called "holes" in the digital collection to be filled in). A script consults the "fetch.txt" file, follows the URLs, downloads the files and aggregates them into the local "data" directory within the bag. The sender’s source files do not need to reside in the same directory or on the same server; they can be retrieved from many different sources. A holey bag becomes complete after the digital collection is entirely downloaded and its manifest file is verified.
I love the absolute simplicity of this, and in this I see the guiding hand of John Kunze of the California Digital Library, who has always seen the utility of simplicity to enable longevity. His identifier scheme, Archival Resource Key (ARK), is but one example of this. Kudos to John and the rest of the team for coming up with something so simple and yet so useful and effective.

By the way, if you're at all interested in this, you simply must see the video that is designed to introduce BagIt. Perhaps it isn't quite up to Michael Jackson's level, but given what resources the Library of Congress has to work with, it totally rocks. It has humor, awesome file footage, well-done segues, and overall good production values. Congrats to the crew who produced it.

I'm telling you, just BagIt. Oh yeah, and tag it -- yeah, that's right, tag it too. And moonwalk into the bright light of the new day, knowing that your important content is safely bagged and tagged.



Posted by Roy Tennant on July 6, 2009 | Comments (2)


Email
Learn RSS


July 7, 2009
In response to: BagIt -- Just BagIt
DrWeb commented:

The whole SheBagIt.. sorry, couldn't resist. Roy, it's a fascinating structure and functional use will probably follow the form. I was wondering if you know anything about the security aspect. The declaration file (authenticity), for example. Is there any encryption scheme or algorithm envisioned for these "bags"?
Best,
DrWeb




July 16, 2009
In response to: BagIt -- Just BagIt
Peter Binkley commented:

It's a great little spec. But I wonder - why was the "bag" metaphor for a collection of stuff still available? We've used folders, packages, buckets, bins, and more abstract terms like container or archive - how did bag escape? And what container metaphors remain unexploited? Will some future spec have to define a portmanteau?





POST A COMMENT
Display Name or Registered Users Login Here.
Please restrict submissions to less than 7,000 characters (including any HTML formatting).

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.

Advertisement

Advertisements





©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites