New Dog, Old Trick: Alerts for RSS Feeds
Kevin Broun describes how one library is pushing current information via email—and without licensing fees
Kevin Broun (netConnect) -- netConnect, 7/15/2004
Last year I wrote about a project to bring Internet content into LION, the online library system at the National Cancer Institute (NCI) ("Integrating Internet Content," LJ netConnect, Fall 2003, 10/15/03, p. 20). This service gathers information from web sites that publish or syndicate "newsfeeds" in RSS. This year we've gone a step further and begun "pushing" this content out to our users.
A form of XML, RSS is variously described as "Really Simple Syndication," "RDF Site Summary," and "Rich Site Summary." Because the content of an RSS newsfeed is metadata only—generally titles, links, and descriptions—we are able to enhance the LION collection with current information without additional licensing costs. Since last year, we've added "Yahoo! Health" and now have six newsfeeds, with their corresponding catalog records updated several times each weekday.
We've known we needed to make this service more useful to our clientele at NCI. In particular, we realized that newsfeed content is suited to a current awareness service yet we were relying on users to find it posted on the LION site. Though we provide links to the raw XML/RSS feeds so users can subscribe directly using their own news software (or "aggregators"), we felt it was unlikely that many would do this, however relevant the content. Our concerns have been borne out in conversations with clients. None have installed aggregators, and few have heard of the technology.
The old trick: emailIt was clear that a different push strategy was required to cultivate an audience for RSS content. Fortunately, we already used email to push current awareness services to our clients. Since 2002, LION included email-based subscription services for NCI Current Clips (daily news headlines), The Cancer Letter (weekly newsletter), and personalized subject profiles (any new items added to the collection, filtered by indexing terms).
The use of email for publicity and distribution seemed logical. We decided to develop a beta test service to determine technical feasibility and acceptance among our clients. Balancing the frequency of change in the newsfeed content with concerns about spamming our clients, we settled on a concept of daily updates of newsfeed headlines, with links back to the LION item records where the full information and links to full text would be available. To help market it as an extension of the daily-news-oriented Current Clips, we called this new alerting service NCI Current Clips Supplement.
Structuring the dataOur initial implementation of newsfeed data in LION was simple. We parsed and converted a newsfeed's XML-based information into HTML, then stored it in a notes field, replacing any previous information in that field. This enabled current information in the catalog record, updated automatically every four hours. As we monitored our newsfeeds, we noticed a great deal of irregularity in the life span of the articles ("items" in RSS terms) within the feeds. Some feeds would change all their items daily or more often; others would replace some items frequently; while in others items would persist for several days.
The next levelTo make our proposed current awareness service useful and nonrepetitive, it was clear that rather than simply update the catalog record of each of the six newsfeed sources, we would need to keep track of the information at the item level. We would save only the new items from each newsfeed update and then send out in our email alerts only unique items not previously sent. To manage the newsfeed information, we added a new table, "FeedItems," to our database structure, as shown in the diagram "RSS Content Acquisition Architecture ," p. 20.
As the diagram indicates, we use each newsfeed item's URL as the primary key because it is a unique data point. To process each update, a ColdFusion script, which runs every four hours, parses the newsfeed's XML data into a structure containing a key for each article in the feed. It then loops through the structure, adding a record to the FeedItems table only if that article's URL isn't already in the table. New articles are marked as not yet sent.
Each weekday morning, a separate script retrieves the titles of all the articles that haven't already been sent, arranges them according to which feed they originated from, and sends the results as an email to users who've subscribed to the Clips Supplement service. It then marks those records as already sent, so they're not repeated in future email alerts, and deletes records older than 30 days from the table so the database doesn't expand to huge proportions and degrade in performance.
Other open source optionsWe developed a custom solution to making RSS feed content available. In other situations, it may be appropriate to apply more generic methods of repurposing RSS content. For example, there are a few open source projects to convert RSS feeds to email, including "rss2email" and the similarly named "FetchRss" and "fetchrss." These are Python, Perl, and Java tools, respectively, with varying system requirements and installation complexity. But all can track a set of RSS feeds and generate emails containing new items added to the feeds since the last update.
Like the customized solution we developed at NCI, these tools give libraries and information services another way to jump on the syndicated content bandwagon and provide their clients with this new kind of Internet content in a familiar package.
Beta timeTo test the proposed service, we worked with 25 volunteers, whose reactions were generally positive. Some issues came up; almost immediately, users reported difficulty finding the particular item they had seen in the email alert after they clicked the link to the newsfeed's page on the LION site. This was because the items were showing up unsorted and sequenced differently in the email message than on the newsfeed's web page. To resolve this, we sort the newsfeed alphabetically by title when the catalog record is being updated and likewise sort the article titles when compiling the content for the email alert.
A second issue was the "disappearing article" problem. Users would sometimes find that an item in the email alert was no longer included in the newsfeed, especially if several hours or more had passed between the alert and when they read it.
Owing to the rapidly changing newsfeed content and the lack of a full-text cache, we haven't found a good solution to this. We do make sure that at the time the email is sent, the items it references are still online and include a caveat to that effect.
Newsfeed feedbackRegardless of these issues, our testers were happy with the Clips Supplement. In a survey taken after several weeks of testing, users gave an average rating of 3.6 on a satisfaction scale of 1 to 4, with all either "somewhat" or "very" satisfied. Each of the individual newsfeeds scored at least "somewhat" or "very" useful on average, validating our inclusion of both cancer-specific and general, health-focused newsfeeds.
The only change we made based on the survey results was to stop sending the alerts on federal holidays—we don't send alerts on weekends—which keeps us from clogging inboxes and avoids a little of the "disappearing article" problem.
Going liveBased on these results, we offered the Clips Supplement service to all LION users in February. About 70 users subscribe today. While this news-hungry segment of our user base is relatively small (our Current Clips and Cancer Letter subscribers number about 350 and 420, respectively), they find it useful. One client reported, "This helps me scan the news to see what is important for projects that I am working on. It has been enlightening." Another is very satisfied but still bothered by disappearing articles. It would certainly be helpful if the RSS or proposed Atom specifications—recommended standards for syndicated information—someday include an "expires" element or attribute and that the newsfeeds would then support it.
In spite of some nagging issues, we consider the rollout of NCI Current Clips Supplement to be a success. With an up-front investment of some time in programming, content selection, and user interaction, we deliver this new content to our clients at no ongoing cost. And the content is delivered on a daily basis but with virtually no daily labor required.
Going forward, we will continue to search for appropriate newsfeeds to add to the collection, monitor technical developments in the newsfeed syndication arena, and stay in touch with our clientele to increase usage and ensure the relevance of our services.
| Author Information |
| Kevin Broun (kbroun@nih.gov) is Senior Web Developer and Lead, Electronic Information and Library Services, Communication Services Branch, National Cancer Institute, Bethesda, MD |
|
It’s really simple: Data flow for integrating RSS-formatted content into the library system, which is pushed to customers through email.























