Login  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine
Email
Print
Reprint
Learn RSS

Yesterday's Headlines

Richard Wiggins takes a look at newspaper databases and finds there's an archive digital divide

Richard Wiggins (netConnect) -- netConnect, 4/15/2002

The World Wide Web opens up myriad and diverse global information sources. One of the most important is access to news papers both great and small from locations around the planet. Not only can news hounds in Ames, IA, read the New York Times or the Seattle Post-Intelligencer online, they can just as readily read the Jerusalem Post or Le Monde.

Often a reader wants to track how newspapers are handling a specific story. Some stories, such as the September 11 attacks, garner headlines worldwide, making it easy to find today's articles on the subject. But other stories may not receive front-page coverage; then a search engine is necessary.

Often it's not just news for the last week that we seek. "Yesterday's news" may conjure up images of the recycling box, but for the local history buff, the serious historian, the genealogist, and many others, past newspaper articles offer tremendous value. (For a full evaluation of various news sites, see E-Reviews.)

When searching for past newspaper articles, users face a range of choices: the individual newspapers themselves; current news indexes such as TotalNews.com and Google's news index; traditional professional aggregators like LexisNexis, Dow Jones Interactive; FirstSearch; ProQuest; EBSCOhost, and Gale's InfoTrac; and upstart web-based aggregators such as NewsBank and NewsLibrary.

The avenue users take can have tremendous impact on the quality of their research. Users who have access to a major research library or a corporate intranet have tremendous advantages. (For a list of links to national and international newspapers, go to newslink.org).

Search for free, pay for archives

Many newspapers, such as the New York Times (NYT), put a search box prominently near the masthead, making it easy for users to execute a search immediately after loading the homepage. Others make it hard to find their search function. The Detroit Free Press homepage prominently offers an advertisement for searching classifieds, while burying links for Search and for Archive far down a list of links along the left column of the screen. This could be an error of omission or it could be by design: perhaps the editors feel it is better to encourage readers to click on article links to find their way, thus ensuring that the banner ad odometer rolls over more times.

Most newspapers offer the current day's content, or a portion of it, on the web at no charge. Many allow searching across both freely available content as well as the online archive at no charge. Articles are free for, say, seven days after initial publication. When a reader selects an article older than the window of free access, the site prompts for a payment method. Perhaps unique among major newspapers, the Wall Street Journal charges a $59 annual fee to read current and archival articles online and $2.95 apiece to read articles from a 15-year archive of WSJ and other Dow Jones content, powered by Factiva.

Functionality

If you seek current or past articles from a single newspaper, your first instinct will be to go to that paper's web site. In an era when a majority of web users have gravitated to one favorite search engine—Google—newspaper search engines vary greatly. Besides employing different search tools, they have different features, functionality, and relevance ranking of results as well.

Most newspaper search engines offer a basic single search box; many also offer advanced searches where users can filter their searches or change the sort order. Nevertheless, all of the newspaper web site search engines have limitations:

  • Few, if any, use a thesaurus. If the newspaper style sheet calls for "al-Qaida" and the reader types in "al Qaeda," no articles will be found. Misspellings are also common; a user spelling it right won't match a reporter spelling it wrong.
  • Many archives have no illustrations.
  • Post-Tasini, many archives lack contributions from freelancers.
  • The web version of the paper may not reflect all articles of the print edition; some papers also truncate articles for online editions. Readers thinking they have done exhaustive searches may have not.
Test-driving search engines

Two days after the automobile industry announced its February sales, a test search revealed some of the variations among the search engines of North American newspapers. Let's say an investor is curious about Ford's performance and types "ford sales" (without quotation marks) into the newspaper's basic search box.

  • The NYT's very first item is exactly on point—"Chevrolet Sales Surpass Ford's for First Time in 10 Years." The Chicago Tribune and Los Angeles Times list relevant articles first, too.
  • USA Today mysteriously lists articles that appear to be relevant, but date sorting is random, with articles from last year at the top. No option appears allowing the user to rank the hit list by date. The hit list offers a link to the paper's full archives; the archives engine sorts by date by default. But the archives hit list fails to offer any relevant article.
  • The Dallas Morning News lists several articles that appear irrelevant, followed by a Bloomberg News wire piece that is on point. The hit list lacks article dates—a major omission—but inspection reveals poor ranking by date.
  • The Miami Herald appears to have no index of articles within the last week, and a NewsLibrary archive of articles older than that is outside our interest.
  • The Atlanta Journal-Constitution lists many outdated and irrelevant articles. (Its engine is powered by ProQuest Archiver.) If the search is filtered to include only articles from the last few days, an article that appears to be relevant is listed—but the AJC charges even for recent articles.
The role of aggregators

Libraries, of course, for years have helped patrons find relevant articles from back files of newspapers. Many libraries have moved from the paper file and microform era into offering online databases like LexisNexis. Readers using such a database will find several advantages over newspaper-run search engines:

  • The archive covers a comprehensive set of newspapers of all sizes, from all locales.
  • It is easy to search across all titles, allowing a single search to cover numerous major papers efficiently.
  • Search filtering may be much more sophisticated.

Web services to search across media sites provide a new kind of aggregation. General search engines such as Google and AltaVista detect searches on breaking news and offer media-published articles prominently. Other specialty sites such as TotalNews.com, WorldNews.com, and NewsIndex.com allow searches of a wide range of current news sources. These news services maintain special separate indexes of the current contents of partner newspapers and other media outlets.

But current news indexes can't offer a window into the archives of past newspaper coverage; that would undercut the market for aggregators and for newspapers' own archives. The general user does have access to online archives. For instance, Northern Light, recently reorganized to focus on commercial content, offers NYT articles for a fee.

Other aggregators are appearing as well. One company, NewsBank, seeks to serve both patrons at public libraries as well as individual pay-by-the-sip users on the web. NewsBank licenses access to libraries to the full-text archives of over 200 newspapers. The company also provides what it calls its "e-commerce product," NewsLibrary, which sells individual articles from over 80 partner newspapers. In fact, some newspapers, such as Hearst's Miami Herald and San Jose Mercury News, use the NewsLibrary index as their own for-fee archive marketed on their web sites.

Another newspaper aggregator service for the general web user is eLibrary, offering archives of newspapers and periodicals from around the world. The eLibrary user can subscribe to the service for $79.95 per year or $14.95 per month.

The digital newspaper divide

The tools used to search newspaper archives depend greatly on the user's affiliation. Perhaps the people who have the best of all worlds are those who work for major newspapers. Richard Geiger directs the library at the San Francisco Chronicle. He says virtually all reporters use the in-house LexisNexis or Dow Jones Interactive databases for their research of archival newspaper sources. There are exceptions—for instance, when a small newspaper breaks a story: "Occasionally we need to cover a story first reported in the local newspaper of a remote small town. In the past we'd have to call and beg them to fax us a copy. Such small papers are not covered by LexisNexis. Now we can often look it up on the web."

In general, however, reporters at major newspapers eschew free web news indexes or newspapers' archives. Geiger says deadlines are so tight that it would be inefficient to do an archive search by visiting newspaper sites one at a time.

Big-city newspaper reporters also have an advantage looking at their own content. For instance, a reporter at the Chronicle looking for past articles could search an aggregated database, the Chronicle's own web site, or its in-house electronic story archive (once known as "the morgue"). The internal archive provides reporters with far more abundant functionality and completeness than a general user encounters on a newspaper's web site.

Students and faculty at major universities also enjoy access to comprehensive news indexes. For instance, LexisNexis markets a somewhat slimmed-down service to universities as its Academic Universe product. Searchers can do full-text searches across many years and across many of the leading newspapers worldwide.

Paradoxically, students may discover the in-depth, comprehensive newspaper archive the university has licensed on their behalf only when they bump up against a by-the-sip charge: Doris Helfer, chair of technical services at California State University, Northridge (CSUN), and a science librarian, says, "At CSUN, undergraduates balk when they encounter per-article fees at newspaper web sites; we seize on the opportunity to show them the advantages of a comprehensive, multiple-newspaper archive."

A student at a community college or a patron of a small public library may not have free access to a service such as Nexis. Those whose library doesn't subscribe to NewsBank or another aggregator may find themselves paying by the article, or subscribing to a service such as eLibrary—or they may simply not be able to do comprehensive searches of past newspaper articles.

Archive choices are limited for citizens of small towns—and for reporters at small-town newspapers. Lucinda Davenport, a journalism professor at Michigan State University, notes that many small newspapers lack access to aggregator databases, which greatly limits the scope of their reporting. Davenport points out that a local reporter can often gain context by searching databases: "If a train goes off the track it may appear to be a local story. An archive search may reveal that this accident fits a larger pattern. Then you have a much bigger story."

The newspaper archive digital divide means that students and faculty at major universities can perform far superior research when mining past news coverage than can their peers at smaller schools and community colleges. The same is true for reporters at major newspapers, amateur historians, or citizens researching a political issue.

What's the solution

Dan Gillmor writes a widely read technology column for the San Jose Mercury News, a major newspaper. He uses his paper's licensed databases for searches of articles at major newspapers, and he has access to a comprehensive archive of his own newspaper's past articles. Since his column is forward-looking, he says, "Google is always the first stop, because it's indexing information in a much more timely and, often, more relevant way. Then, if it's a tech story...I go to the archives of major tech media, few of which charge for archives."

Gillmor offers an idea that could help bridge the digital newspaper divide: "I have a feeling that the newspaper industry would be better served by opening up the archives and Googling them (and selling related ads based on keywords entered) than charging for individual searches. That's just a feeling, of course, and I have no data to back it up."

It's an intriguing idea the industry should consider. Google competing with LexisNexis? Stranger things have happened.


Author Information
Richard Wiggins (richardwiggins.com ) writes frequently about Internet topics and is a Senior Information Technologist in the Michigan State University Computer Laboratory, East Lansing

Email
Print
Reprint
Learn RSS

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

There are no other articles written by this author.

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
CRÍTICAS
©2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites