Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine
Email
Learn RSS

Tennant: Digital Libraries   



Link This | Email this | Blog This | Comments (5)


The Great Web Site Die-Off: Why It Matters

August 28, 2009 You may not think that a web site needs something akin to a living will, but many do -- or more accurately, will. These are sites that are basically "sole proprietorships" -- sites that are only maintained by an individual, and likely an individual without any clear path for web site survivorship.

I've written before about this before, and dubbed it "the great web site die-off" since it seems to me that we will soon be entering a period when a number of sites will go dark because of sheer neglect. So it was with some interest that I saw a "tweet" on Twitter from John Mark Ockerbloom that read:

"What will you do that outlives you? John Hare works to ensure persistence of his Internet Sacred Texts archive: http://bit.ly/AUpuo"

The tweet linked to an article in the Santa Cruz Sentinel (CA) newspaper, where it described the measures that the owner of Sacred-Texts.com, John Bruno Hare, was taking for its ongoing care in the face of his advancing cancer. Although I would not categorize his site as a sole proprietorship (besides volunteers, he actually has two employees!), it reminded me of this issue and I don't think that I've yet exhausted the angles from which to examine it. Whereas he had a site that generated revenue, and therefore has resources to support its ongoing maintenance, many sites that have value do not.

Therefore, I think it is worthwhile to investigate what sole proprietor web site managers can do to ensure their site(s) live on, and what libraries and archives can do to ensure that they do when doing so makes sense for their mission and audience. I will take as my "case in point" a web site I created that I think can, and should, eventually be taken over by a library or archive -- not necessarily to live on "as is," but to have the content and metadata rescued in some way.

The site is StanislausRiver.org, and was created to gather together in one place digitized content representing the Stanislaus River in California and the fight to save it from drowning by the New Melones Dam. There is not much there yet, but we're building it. Content from the site may also end up in a documentary film with the tentative title of Last River Lost.

The fight to save the Stanislaus River gave rise to Friends of the River, still very much active today. Two libraries at the University of California, Berkeley already have collections from this organization. These collections are at the Water Resource Center Archives and The Bancroft Library. The Water Resources Center Library also has various videos and other materials that relate to this issue.

So why have I started this web site to gather content that might naturally go to a library directly? Because it won't -- at least not as it is currently held. Much of the material that is being gathered at StanislausRiver.org are bits and pieces (and in some cases, some very substantial pieces) of small personal collections unlikely to be of interest "as is" to a library or archive. There are photographs that people have taken, brochures, flyers, and posters individuals have collected, personal testimony given at government hearings, etc. The web site I've created can serve an important role in bringing this content together from the personal collections of individuals into a collated, organized, and described collection that may one day find a home alongside related collections in an actual archive. I'm no archivist, by any stretch of the imagination (I envision my colleagues laughing at the idea), but I'm an interested, committed individual with knowledge about the issue which I'm helping to document. It may be worth noting that this is mainly how archival collections have been created in the past -- one committed individual collected materials on a subject or related to an individual -- with the only difference being that the collecting is happening now in digital form.

But that's also the rub. I'm an interested individual. And despite any ideas I have to the contrary, I'm only one ill-considered walk across the street from annihilation. Should that happen, there is no clear path for any of my heirs to migrate this site into the hands of someone willing and able to take it under their control. It would die. Not immediately, but not long after my credit card no longer clears.

Sure, there isn't a lot there right now, but we have big plans. I found more slides to scan just tonight, and there is at least one real treasure trove of content that has yet to be tapped. This site will matter, of that I can assure you. And the content should live on in some way. Some of us were there at the last great battle against big dams. And we documented it, every step of the way. And it belongs in an archive that can manage it the way it needs to be managed.

If you think the "Wayback Machine" at the Internet Archive is up to the task, think again. Go take a look at a site that you know well and let me know how good of a job it does in archiving the site. If it's anything like I've seen, it might be OK when you need a surficial feel for a site, but for anything deeper -- good luck.

Subsequent posts on this topic will try to determine how "sole proprietorships" and libraries and archives, can make sure good web sites are resscued when they should be. Meanwhile, I'd be interested to know your thoughts. Do you know of any sites that have been created and managed by individuals, but that may need to be "rescued" by a library or archive in the future? Do you have any thoughts on how best this might happen?

Posted by Roy Tennant on August 28, 2009 | Comments (5)


Industries: News & Features
Email
Learn RSS


August 28, 2009
In response to: The Great Web Site Die-Off: Why It Matters
Bob commented:

This problem came to light to me just last night as I prepared some material for up coming Cub Scout meetings. Several GeoCities sites have old but still valuable scouting content. The Save GeoCities effort at the Internet Archive will certainly not help me next year when I start up meetings again. This content will be lost.

To turn that argument around though. The new scouting.org site has developed a lot of new content that is better, but harder to find due to poor SEO. And new scout masters are creating new content every day.

Does the Web need curation if we have an infinite number of monkeys creating an infinite number of documents on an infinite number of topics? Or do we just need better discovery systems?




August 28, 2009
In response to: The Great Web Site Die-Off: Why It Matters
Roy Tennant commented:

Certainly I'm not trying to say that *everything* must be saved, since as you point out, newer content, at least for some things, is better. But I also know that there are probably categories -- items of historical interest, for example -- where it makes sense to try to rescue it.




August 28, 2009
In response to: The Great Web Site Die-Off: Why It Matters
John Bruno Hare commented:

Thanks, John Mark Ockerblook referred me to this article. I'm John Bruno Hare, the subject of the Sentinel article. One small point. In the past year (since I learned of my diagnosis) I've turned the sole prop into an S-Corp. I've also created a trust to hold sacred-texts in perpetuity.

These points didn't get into the article because they are sort of technical, but this is what I've done in terms of end-of-life planning for my website.--J.B. Hare




August 28, 2009
In response to: The Great Web Site Die-Off: Why It Matters
Erik Hetzner commented:

By all means if you intend to keep a site up-to-date & active a "will" for your web site should you lose interest or be unable to maintain it is essential.

For the really long term preservation of web sites, and for conveying (to the extent possible) what a site actually was like in the past, the Internet Archive and services like them are the only practical solution. It is simply not possible for today's dynamic sites to be kept alive longer than they are used. The price is too high, in time and money.

There are steps that you can take to ensure that the Internet Archive properly archives your site. A first step is making certain that you are not disallowing the Internet Archive from crawling your site via robots.txt. If the Internet Archive does not crawl your site deeply enough, you can work with librarians at other archives to ensure that your site is preserved.

(I work on the California Digital Library's web archive, webarchives.cdlib.org, so I may have some bias here.)

-Erik Hetzner




September 1, 2009
In response to: The Great Web Site Die-Off: Why It Matters
Mike Cane commented:

There is the Internet Archive and Wayback Machine. But I got a nasty surprise with that recently, where a site that died last year prevented the Archive from making an ongoing copy. Years of that site is now POOF! gone.





POST A COMMENT
Display Name or Registered Users Login Here.
Please restrict submissions to less than 7,000 characters (including any HTML formatting).

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.

Advertisement

Advertisements





©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites