Open-Source Email Archiving Software Expands with IMLS Grant

by Matt Enis
Sep 10, 2015 | Filed in Programs+

The ePADD open-source email archiving and processing platform developed by Stanford University Libraries was awarded a $685,000 National Leadership Grant by the Institute of Museum and Library Services (IMLS), which will fund the project for an additional three years, enabling the developers to enhance ePADD’s usability, scalability, and feature set, in partnership with the University of Illinois Urbana-Champaign, Harvard University, University of California, Irvine, and the Metropolitan New York Library Council (METRO).

The ePADD open-source email archiving and processing platform developed by Stanford University Libraries was awarded a $685,000 National Leadership Grant by the Institute of Museum and Library Services (IMLS) on August 31. The software "supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives," according to the project site. "Email archives present a singular window into contemporary history; however, they are often inaccessible to researchers due to screening, processing, and access challenges, as well as the sheer volume of material." With funding from the National Historical Publications and Records Commission, and collaboration by Sudheendra Hangal, creator of Stanford’s Memories Using Email (MUSE) software, the team at Stanford Libraries’ Department of Special Collections and University Archives completed the proof-of-concept phase of ePADD’s development in July. This new grant from IMLS will fund the project for an additional three years, enabling the developers to enhance ePADD’s usability, scalability, and feature set, in partnership with the University of Illinois Urbana-Champaign, Harvard University, University of California, Irvine, and the Metropolitan New York Library Council (METRO). "One of the reasons we created this software is that there were very few tools that would actually allow us to review and process a collection to make it discoverable and accessible," Glynn Edwards, ePADD project director and head of technical services for Special Collections at Stanford University Libraries, told LJ. "For this software platform and our project that was our main goal." Optimized for Windows 7 (with Java 7 or higher) and Mac OS X, and designed to process email in both mbox and EML formats, ePADD includes separate modules for the appraisal, processing, discovery, and delivery of large email collections.

Confidentiality concerns

The appraisal module allows donors and archivists to review collections, including attached documents or photos, prior to transferring files to an archival repository. This includes tools that enable users to search for potentially sensitive content, such as credit card or social security information. If an institution or a donor has used specific formats for other types of sensitive information—such as a string of characters in a faculty identification card, or any unique, regular expressions that signal confidential communication within the collection—archivists can add those to ePADD's automated processing forms. Email messages can then be flagged, annotated, or restricted individually or in bulk prior to processing. These features will help archivists make email collections as accessible as possible, while ensuring privacy and confidentiality for donors and third-parties discussed in correspondence. “Depending on the email archive…there may be issues with [the Family Educational Rights and Privacy Act] FERPA or [the Health Insurance Portability and Accountability Act’s privacy rule] HIPAA, or state or local statutes around privacy and confidentiality,” said Josh Schneider, ePADD community manager, and assistant university archivist for Stanford University's special collections and archives. “In archives, you need to work a lot with donor privacy as well as third-party privacy. So, we wanted to make a tool that had functionalities in it that let people easily search for potentially private or confidential information and take actions on messages containing that information, and do it in a way that really lends itself to bulk actions to deal with the volume [of email collections].” During ingestion, the ePADD Processing module conducts several automated processes, Schneider said. For example, "it takes the various names and email addresses associated with a particular individual and it concatenates those, [resolving] the name of an individual," Schneider said. "It also identifies and extracts named entities in the archive,” using OCLC FAST to search Library of Congress subject headings, as well as the LC Name Authority File, DBPedia, and the Virtual International Authority File. “Persons, organizations, or locations that are mentioned in the subject line or body of the mail message, ePADD identifies those and extracts those. And a lot of the advanced browsing functionality and search functionalities that ePADD does depends on that early activity of extracting those named entities."

Search and discovery

The discovery module runs on a web-server, enabling remote users to search an archived email collection using a browser, with full-text access limited based on the donor’s wishes or an institution’s policies. Remote users must contact the host institution to request access to specific full-text messages or attachments. For example, Stanford’s own ePADD discovery module for the library’s Robert Creeley email archive enables browsing, searching, and graphing by named entities, but redacts all other text from each message. ePADD graphing tools

“Because of policies at Stanford, we are only delivering the extracted entities—the persons, places, locations, and organizations,” Edwards said. “Within the body of the email message, you can see the extracted entities, but you won’t see the full text of the archive, nor do you see the domain for the correspondent’s [email].” Searching can be limited to incoming or outgoing messages, and a bulk search query box enables users to search a block of text to match against a collection’s entity index. Graphing tools enable users to visualize how often specific people, organizations, and locations were mentioned within the archive, and when those entities were mentioned. “It gives you really clean data, in which to see the top correspondence over time, or the top topics that have been discussed over time in the account,” Schneider said. By contrast, most email programs only facilitate discovery by searching. “You can’t go into, say, Gmail, and identify the top 10 people you corresponded with between 2005 and 2010, which locations were most discussed…those aren’t questions that most email programs can handle. But, ePADD, because it’s doing some indexing of the messages at ingest, is able to answer some questions like that.”

On-site access

In contrast to the discovery module, the delivery module enables archivists to provide moderated full-text access to a processed email collection, typically in an on-site reading room. In addition to the searching and graphing functions of the discovery module, on-site users can generate complex tiered searches using a customizable lexicon, and explore images and other email attachments within the collection. Users can also request copies of messages or attachments using a “checkout cart”-type feature. Users can download ePADD and a detailed user guide from the project website library.stanford.edu/projects/epadd. The site also features a community resources page where new users can seek help, share expertise, or contribute a use case. While installing the discovery module on a web server will likely require the help of an institution’s IT department, Edwards said that the software is otherwise flexible and scalable enough that interested users can download it to their personal computers to explore its features using their own email accounts. Schneider encouraged archivists at other institutions to check out the free software, noting that, “we’re doing our best to try to promote [ePADD] as a community resource. It’s open source, and we’re interested in getting use cases and really developing a community of practitioners around the software.”

Add Comment :-

Comment Policy:

Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
Don't use obscene, profane, or vulgar language.
Stay on point. Comments that stray from the topic at hand may be deleted.
Comments may be republished in print, online, or other forms of media.
If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.

Fill out the form or Login / Register to comment:

(All fields required)

First Name should not be empty !!!

Last Name should not be empty !!!

email should not be empty !!!

Comment should not be empty !!!

Please check the reCaptcha

Comment should not be empty !!!

CONTINUE READING?

Non - Subscribers

Subscribers

INNOVATION

MIT’s Grand Challenges Issues Final Report

by Lisa Peet

ARCHIVES & PRESERVATION

VHS Preservation Project Announces Founding Members

by Matt Enis

ARCHIVES & PRESERVATION

UBC Library Partners with French Department on Revolution Pamphlet Collection

by Lisa Peet

ARCHIVES & PRESERVATION

Library of Congress Launches Crowdsourcing Platform

by Matt Enis

INNOVATION

Controlled Digital Lending Concept Gains Ground

by Matt Enis

ARCHIVES & PRESERVATION

Adam Matthew Launches Quartex Digital Library Platform

by Matt Enis

RECOMMENDED

REVIEWS+

Run Your Week: Big Books, Sure Bets & Titles Making News | July 17 2018

Neal Wyatt Jul 17, 2018

The Other Woman by Daniel Silva leads holds this week. Former President Obama has more summer reading. Downton Abbey is heading to the movies.

TECHNOLOGY

Materials on Hand | Materials Handling

Matt Enis, May 16, 2018

Automated systems are helping libraries move staff to patron-facing work, while manufacturers innovate new design features.

PROGRAMS+

LGBTQ Collection Donated to Vancouver Archives

Lisa Peet, Jun 21, 2018

Longtime archivist, former head of the Vancouver Public Library’s history division, and queer rights activist Ron Dutton donated more than 750,000 items documenting the British Columbia LGBTQ community to the City of Vancouver Archives in March.

ALREADY A SUBSCRIBER? LOG IN

We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing

Open-Source Email Archiving Software Expands with IMLS Grant

Get Print. Get Digital. Get Both!

Add Comment :-

Comment Policy:

CONTINUE READING?

Added To Cart

RELATED

MIT’s Grand Challenges Issues Final Report

VHS Preservation Project Announces Founding Members

UBC Library Partners with French Department on Revolution Pamphlet Collection

Library of Congress Launches Crowdsourcing Platform

Controlled Digital Lending Concept Gains Ground

Adam Matthew Launches Quartex Digital Library Platform

Run Your Week: Big Books, Sure Bets & Titles Making News | July 17 2018

Materials on Hand | Materials Handling

LGBTQ Collection Donated to Vancouver Archives

Log In

REGISTER FREE to keep reading

If you are already a member, please Log In

Success.

Create a Password to complete your registration. Get access to:

ALREADY A SUBSCRIBER? LOG IN

ALREADY A SUBSCRIBER? LOG IN

Thank you for visiting.

SUBSCRIPTION OPTIONS

Already a subscriber? Log In

Thank you for visiting.

Already a subscriber? Log In

Already a subscriber? Log In