Advertisement
Articles

Updated: Library of Congress Acquires Donated Twitter Archives

E-Mail This Link


Enter recipient's e-mail:


Close
Email
Print |
RSS |
Share | |

Will also collect current tweets going forward; new Google Replay announced

Norman Oder -- Library Journal, 04/14/2010

  • Mixed reactions: skepticism, support
  • Announcement on blog, Facebook
  • Six-month delay on (non-commercial) use
  • Borowitz: "Museum of Crap"

If it had been announced on April 1, there might be even more eyebrows raised, but the Library of Congress (LC) has acquired (via donation) every public tweet since Twitter’s inception in March 2006 and will digitally archive those tweets going forward.

A preliminary announcement was made on LC's blog and Facebook page. The official press release was issued on April 15, quoting Librarian of Congress James H. Billington: "This information provides detailed evidence about how technology based social networks form and evolve over time. The collection also documents a remarkable range of social trends. Anyone who wants to understand how an ever-broadening public is using social media to engage in an ongoing debate regarding social and cultural issues will have need of this material."

Twitter came to LC, not vice versa
Though the New York Times reported that the "library reached out to the company a few months ago about adding Twitter’s content to the national archives," LC spokesman Matt Raymond told LJ that "It's more accurate to say Twitter approached us with the idea, to see whether we thought it might be of benefit, and then we began to pursue it." (He clarified that his initial statement to the Times was based on bad information.)

"Our Web Capture team here at the Library evaluates sites for digital preservation," LC's Matt Raymond later told LJ. "Social media in general and Twitter in particular can be viewed as part of the historical record of communication between government and citizens, news reporting, and social trends--all of which pertain to the Library's cultural heritage collections."

"The Twitter archive fits in with this mission and contains untold historical value," he added. "We in no way discriminate against sites, social media or otherwise, that are consistent with our mission and collections, so we 'take all comers.'"

Still, he said LC doesn't currently have plans to approach any other social media sites in a similar way: "In many ways we view this acquisition as a great test case for us in how to acquire, preserve and serve this kind of historical data."

"I think Twitter will be one of the most informative resources available on modern day culture, including economic, social and political trends, as well as consumer behavior and social trends," Margot Gerritsen, a professor with Stanford University's Department of Energy Resources Engineering and head of the Center of Excellence for Computational Approaches to Digital Stewardship, a partnership with LC, said in the official press release.

Adding storage
There are billions of tweets, but the archive would add only about 5 terabytes of storage. LC already has more than 167 terabytes of web-based information, notably legal blogs and web sites of candidates for national office, as well as web sites of Members of Congress, according to the blog post.

Link rot an issue?
Given the limit of 140 characters in a Twitter message, most users use link shorteners, and some of those may not survive, thus jeopardizing connections to the target web page.

Raymond said LC is aware of the issue. "Our web-crawling activity is permissions-based, and we are discussing approaching some of third-party sites to which URL-shorteners point to explore what can be done from a preservation standpoint, at least insofar as it is in line with the needs of our collections," he said. "That's certainly something that's well down the road, though."

Twitter's announcement
Twitter made its own announcement (Tweet Preservation) on its blog, which, interestingly enough, was previously highlighting the launch of "Promoted Tweets," the beginning of Twitter's much-awaited business model.

The Los Angeles Times saw the LC announcement as part of a broader strategy by Twitter to move into the mainstream.

Non-commercial use
Notably, only after a six-month delay can the tweets be used "for internal library use, for non-commercial research, public display by the library itself, and preservation."

Raymond explained, "We have already been working on policies regarding access to digital collections, and we believe the Twitter collection will make an excellent test case. The full archive will be available for serious research. There aren't plans to post it to our web site, with the possible exception of curated presentations around topics, similar to what we do on the Veterans History Project web site."

Google Replay
Twitter also announced that Google has created a way "to revisit tweets related to historic events," known as Google Replay. While the service currently only goes back a few months it will eventually include all tweets.

E-Commerce News saw that announcement as the sign of more potential cooperation between Google and Twitter, with the hated rival being Facebook.

More questions
ReadWriteWeb asks about how it would work:
Will the archive include friend/follower connection data? Will it be usable for commercial purposes? [Apparently not] Will there be a Web interface for searching it, and will that change the face of Twitter search for good? Is there any way that the much larger archive of Facebook data could be submitted to the same body for analysis of the same kind?

Implications for social marketing
Ben LaMothe, on his blog, suggests social media marketers for a company could entice more user input, given that contributions would be sure to be archived. He also thinks it will promote more re-tweets and geolocation reports.

Comments skeptical and supportive
From LC's Facebook page:

"So you guys have given up merely cataloging information and are now cataloging nonsense too?" [Similarly, humorist Andy Borowitz tweeted: "Library of Congress to Acquire Entire Twitter Archive; Will Rename Itself 'Museum of Crap."]

To everyone who thinks this is a bad idea: yes, there is a lot of garbage on Twitter, but there is also a lot of useful information. If I want info on a breaking news event, I can usually find it on Twitter before it hits formal news media - and that includes online media.

I've been thinking about a historian sometime in the future mining information from Twitter to write a history of Barack Obama and what the people really thought when he was elected instead of a bunch of self-important journalists. And not just Americans, but others as well. How many thousands (or ten-thousands) of Tweets do you suppose were posted when this event happened?

from a scholarly point of view, this is absolutely awesome! From a violation of privacy view..... not so awesome. Yes, my twitter account is public. So is my facebook page. Its my content, my thoughts. shouldn't someone at least ask me permission to publish my intellectual content?!?!?

maybe you should read Twitter's Terms of Service, which you agree to by the simple act of submitting a tweet.





 
Advertisement

LJ Reviews Database

LJ Reviews Center

Latest Stories



From the Blogs



Advertisement

Advertisement

Connect with Library Journal


Follow on Twitter








About Us | Advertising Information | Submissions | Site Map | Contact Us | RSS | Subscriptions
©2011 Media Source, Inc., All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Media Source Inc. Media Source Inc. Media Source Inc. Media Source Inc. Media Source Inc. Media Source Inc.