Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Easy ≠ Right

Contemplating federated search, Melissa L. Rethlefsen reminds us that there are no one-stop solutions for quality research

By Melissa L. Rethlefsen -- netConnect, 7/15/2008

In the film version of Harry Potter and the Goblet of Fire, Dumbledore says to Harry, “Soon we must all face the choice between what is right and what is easy.” In libraries, we often discuss the future of search in the same moralistic overtones: “Google dumbs down information seeking to the lowest common denominator and produces marginal results.” “Searching with controlled vocabularies is antiquated, and librarians who insist on directing patrons to specialized, difficult-to-use databases are old-fashioned and providing poor customer service.” And on it goes. But when looking for information, what is right and what is easy? Is easy right, or is it wrong?  Federated search, or metasearch, is at the heart of this dichotomy. With federated search, users can use a one-box, keyword search approach to retrieving information from any number of book and article databases without knowing specialized search techniques or terminology. This is what patrons want—the simplicity of a Google search. This is what we librarians want, too—patrons to use our licensed databases and find high-quality information easily. But federated search doesn’t necessarily make anything easy. In an ideal world, it would, but no matter the implementation or what technology may come, federated search will always be an overly simplistic solution to an extremely complex problem.

Simple-search zombies?

This is particularly relevant to academic libraries. They have been early adopters of federated search technology, but along with the use of federated search and Google have come concerns that students may not be developing important critical thinking and research skills. In a recent post on the ACRLog, Steven Bell asks, “Have we created a generation of search zombies who listlessly tap away at the keyboard with no strategy at all...and then mindlessly settle for whatever their first Google page yields?”

The answer may be yes. When we presume that learning about what databases might best fit a task and how to search those databases most efficiently is too difficult for students, how can we trust them to develop real research and information skills? We can’t. We need to help them use the correct tools for their information needs.

True research requires more than half-heartedly picking a keyword or two, sticking them in a federated search box, and expecting the perfect set of resources to appear magically. Research requires time, discipline, critical thinking, and analysis. No single technique or tool is ever appropriate for conducting research. Research necessitates a balance of complex information-seeking processes, including database searching, consultation with colleagues, hand searching of relevant journals, and mining bibliographies, with a little sheer serendipity thrown in. Now, with federated search posed as a solution to the rising expectation of easy, perhaps it’s time to remind ourselves that research isn’t easy.

Pulling back from Google

Yes, patrons wish research was easier. Yes, patrons want research tools to be easy to use. And, yes, patrons expect that everything will be as simple as Google. But let’s step back a second. It’s true that Google is usually cited as the prime example of “easy.” Type in a keyword or words in the search box, and, presto, in a fraction of a second, there you have your results. And because Google’s algorithms are so good, the results are generally exactly those you need.

Aha, you might say; if Google can give us relevant results most of the time, surely the technology for federated search tools can get there, too. But the heart of the issue isn’t the technology. Google-like relevancy is the end goal of quick information searches but not necessarily the end goal of research. Research requires more than relevancy. There are also currency, authority, and serendipity to consider.

Does all this really matter if our patrons are finding the information they need? Perhaps not. But we must be absolutely sure that patrons are finding what they need using federated search and not just walking away with “good enough.”

The Peloponnesian question

In his essay “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries,” Thomas Mann (librarian at the Library of Congress and author of The Oxford Guide to Library Research) addresses the distinction between quick information needs and scholarship. A quick information need might consist of answering a question such as “How big is the population of China?” or finding a single article on stem cell research, and Google, Google Scholar, and federated search tools might be perfect for the job. But Mann also rightly points out that for in-depth study, in-depth searching skills are required. Finding the best or comprehensive information on a topic requires more effort than searching by typing a keyword or two into a box.

Mann uses the example of a researcher looking for information on tribute payments between Greek city-states during the Peloponnesian War. In this example, the researcher started his own search using Google but quickly became frustrated at the sheer amount of information he retrieved and his inability to determine which of the thousands of results were the best sources on his topic. Mann, on the other hand, using reference books, controlled vocabulary, and the library catalog identified the classic work on the topic along with several other books and journal articles.

With instruction in working with the correct tools or the help of a knowledgeable librarian, searchers can save an enormous amount of time. In the rush to make everything easy like Google, we often forget that searching using the so-called easy tools, whether Google or a federated search product, is hardly easy. More often, it’s overwhelming, frustrating, and time-consuming.

Native features overlooked

Theoretically, it is possible that with even better next-generation search algorithms, semantic search, natural language processing, spell-checking, and controlled vocabulary matching, one-box searching could produce relevant and current results across multiple databases. Metasearch products like AllPlus.com/PolyMeta.com, Vivisimo, and Deep Web Technologies’ Explorit are already making huge strides in federated search technology. Science.gov, ToxSeek, Scitopia, and USA.gov are just a few examples of successfully implemented federated search tools that use one or more of these products or their components.

But even in a perfect world where federated search instruments pulled up highly relevant and focused results for any and all queries, federated search still wouldn’t be the right tool for scholarship and research. That’s because the data in each of our databases are unique, and the search capabilities of each apparatus are designed to make the best use of the content and the content structure or metadata. Many have their own unique taxonomies designed to retrieve precise results and built-in advanced search options customized to the data. By using a single tool to search specialized content, libraries are essentially cannibalizing very powerful and specialized databases for their data but crippling those databases’ abilities to leverage it.

Citation indexes like ISI Web of Science and Scopus are prime examples of resources that offer as much or more value in their interfaces as in their content. Take Scopus, Elsevier’s citation index, for example. What makes it special and useful is not particularly the journals it indexes. Much of what is in Scopus can be found indexed in other products like MEDLINE and EMBASE, and it doesn’t use a controlled vocabulary that would provide better access to these citations than other databases could. Its value is in the interface; it’s in what you can do with the content and the connections among the data. For instance, Scopus uses institutional and author identifiers that disambiguate authors with the same or similar names. This author identifier can be used to create easily a list of publications by a particular author, find an author’s h-index score, and create tables of an author’s articles with links to the number of citations per article per year since 1996.

In the basic keyword search mode, Scopus provides handy tools to sort by relevance or citedness; limit to or exclude various authors, journals, or years; and perform on-the-fly citation analysis of selected citations. For every indexed item, it’s possible to follow paths through authors, journals, and, most important, chains of citations. Is the answer to incorporate more advanced features into federated search (and potentially sacrifice speed and ease of use), or is the answer to keep searching as simple as possible? If better tools already exist, perhaps libraries should spend more time educating users and marketing databases than investing in easier but less useful tools.

Vocabulary matters

Databases that are built around controlled vocabularies are also disabled by federated search tools, all of which see index terms only as basic keywords, not as words imbued with the special powers of precision and recall. One such taxonomy-oriented database is MEDLINE, the premier biomedical literature database. MEDLINE is commonly searched through PubMed, its public interface. PubMed, unlike older and commercial versions of MEDLINE, tries both to mimic Google’s one-search- box approach and take advantage of the highly structured and carefully indexed MEDLINE data.

Using advanced natural language processing techniques, users’ searches are parsed and mapped to medical subject headings, author names, and journal titles as well as searched as text words. Even with this rather remarkable under-the-hood technology, PubMed presents searchers with many of the same problems that Google searchers encounter—thousands of results, many of which are irrelevant—but to a larger degree, because PubMed sorts results by date and not relevance. Using advanced search techniques and appropriate medical subject headings (MeSH) with precoordinated subheadings, it’s more than possible to retrieve highly specific results for most queries, but without knowledge of how PubMed works and how to best take advantage of it, it’s easy to get lost in the one-search-box world.

Good searching, specific results

Though medical librarians are perfectly aware that searching PubMed well requires knowledge of medical subject headings and subheadings, Boolean logic, and the occasional advanced feature, the simplicity of the single search box lures many medical librarians to stop teaching students and other patrons how to use MeSH or even to stop using it themselves. Why bother investing that time if PubMed can automatically and transparently process users’ searches into the appropriate controlled vocabulary? For one, investing more time ahead can save time afterward. If results are specific enough the first time, users won’t have to wade through pages of marginally relevant or decidedly irrelevant results.

Secondly, and perhaps more important, even PubMed’s sophisticated natural language processing gets it wrong sometimes. Perhaps the most commonly cited example is searching for HRT. HRT can mean many things in PubMed, including Heidelberg Retina Tomograph, hydraulic resistance time, hydraulic retention time, and, of course, hormone replacement therapy. When faced with these options, PubMed defaults to a keyword search, meaning all four of these meanings (among others) show up in the first page of results. The technology to disambiguate searches is simply not there, much less the ability to translate queries into a good search strategy, even in this single database.

Site-specific search blunder

Librarians aren’t the only ones skeptical about a Google-style approach to searching. In March 2008, Google released a new site search box that came up during navigational searches for popular web sites. It was designed to allow users to search within a particular web site without having to go there (and without using the Advanced Search page or special syntax). The outcry from the search and business communities was strong, and, within days, Google removed the site search box from results for the majority of commercial web sites, including Amazon.com.

One search engine optimizer, Lisa Barone, remarked, “The site search Google is offering up isn’t going to be anywhere near as strong as the one you have on your site. Why? No Advanced Search features. Chances are you allow users to search only certain parts of your site (blog vs. whole site) or based on select criteria (by product, date, color, price, etc.). Google’s site search doesn’t allow such fancy features, features that help improve the navigability of your web site. Basically, Google is helping you to look less helpful.”

While the uproar over the site search box was largely based on commercial concerns, Barone’s comments sounded another tune—specialized search tools offer features that can’t be replicated in a generic search tool. Similarly, if we substitute specialized library databases for a web site search engine and federated search for Google, we see the same dilemma. If we rely on federated search, we forsake the unique features and capabilities of the database we purchased.

Breaking old habits

Though federated search is not the best tool for most scholarship, it does serve one purpose that makes it a sound addition to any researcher’s kit of tools—it helps people discover new sources of information. Indeed, much of the purpose of federated search is to expose users to purchased content they might not otherwise even know existed. When patrons are confronted with a library web site chock-full of jargon, sometimes even locating the databases can be confusing. And then patrons get to the elusive list of databases only to be confronted with names like ABSEES Online, ACP’s PIER, AdisOnline, and so on. It’s no wonder searchers long for the simplicity of the single Google search box.

Federated search can be a great starting place to help searchers leverage these resources. For example, when the Intel Library implemented Deep Web Technologies’ Explorit federated searching tool, databases with formerly heavy usage started losing ground to other, previously less-used databases as habit and limited database knowledge became less important. Searchers were no longer tied to the familiar and instead could use content based on their information needs.

No simple solution

The promise of federated search is increased patron engagement with library tools, whether through exposure to new tools or as a slightly improved alternative to Google-only research. In the future, we can expect to see better natural language processing, smarter searching with semantic search, advanced spell-checking and term-mapping, and improved relevancy algorithms. Yet, even with all of this, we are still selling our patrons short by hobbling powerful databases solely because it might be easier. Simplicity has its benefits, but it is not a panacea, nor is it truly easy. Oversimplifying can actually make the complex hard.

When we presume tools are too difficult for patrons and begin removing advanced features and capabilities, we are doing a disservice to our researchers. What’s right depends on the needs of the seeker, but for scholarship and research it’s not going to be found in the simplicity promised by federated search. Most of all, it’s up to us as librarians to remember that what is easy is not always right.


Author Information
Melissa L. Rethlefsen (mlrethlefsen@gmail.com) is an Education Technology Librarian at the Mayo Clinic College of Medicine, Rochester, MN

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

Sponsored Links




 
Advertisement
Sponsored Links

MOST POPULAR PAGES

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS


Booksmack
LJXpress
LJ Academic Newswire
LJReview Alert
LJ Criticas Review Alert
SLJ Extra Helping
Curriculum Connections
SLJTeen
PWDaily
Children's Bookshelf
PW Comics Week
Cooking the Books
Religion BookLine
Please read our Privacy Policy
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites