Library Journal Mobile
Log In  |  Register          Free Newsletter Subscription
Subscribe to LJ Magazine

Discovering Linked Data

Fiona Bradley takes a tour of Linked Data endeavors and explains how they can help us make library data easier for everyone to use

By Fiona Bradley -- netConnect, 4/15/2009

Discovery is essential to the future of library services, and Linked Data can get us there. To set themselves apart as more than just static warehouses of resources, libraries are focusing on discovery to provide more meaningful and helpful search results and to give users more information about the materials they find. Linked Data is a building block toward semantic-aware search that will give users these kinds of meaningful results. While that end goal is still largely in the future, there are many possibilities for using Linked Data now to make the most of the resources we already have and to give users more information to make choices about what they want to use.

Linked Data gives libraries an exciting opportunity to make new connections between our collections and the world. It can extend discovery platforms to explore library data, broaden benchmark services, and support the role of libraries as creators and publishers. From bibliographic records to digital library collections, library data is also valuable and interesting to communities outside of our own. Opening up and sharing it will lead to greater innovation as others mashup and extend it in ways we never thought possible.

Below you'll find some great examples of what's already being done with Linked Data and a number of places in which library data might well fit.

The semantic link

Associated with the Semantic Web—which aims to make the web like a giant machine-readable database that gives structure and meaning to data—Linked Data uses Uniform Resource Identifiers (URIs) to create connections. A URI is a way to identify resources on the Internet, which we usually see as a URL. So, how does it work? As with content in a database, you can assign unique keys to distinguish pieces of data, and Linked Data does this via HTTP URIs and RDF (Resource Description Framework, a W3C specification).

Many data sets have already been published as Linked Data using URIs, including photos, biographical information, and bibliographic records from library catalogs. According to Tim Berners-Lee, there are just four simple principles to creating Linked Data:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names
  3. When someone looks up a URI, provide useful information
  4. Include links to other URIs, so that they can discover more things

To unify the data sets that adhere to these four rules, developers have created a number of ontologies. These provide a framework to represent concepts and their relationships to one another and are not entirely dissimilar to thesauri, which librarians know well. There are already quite a few ontologies currently in use, such as Friend of a Friend (FOAF), which describes people and their relationships. Under FOAF, each person is uniquely identified by the use of a URI, which might simply be a link to a personal homepage, email address, or identity profile on a service like OpenID. In an effort to demonstrate how simple linking data using ontologies can be, a group of librarians attending the 2009 Code4Lib conference created a neat example of FOAF data with links to their contact details and interests. (For more on identities and ontologies, see “Making Connections” by Karen Coyle, p. 44).

Extending the discovery layer

Many libraries are extending their catalog interfaces by implementing presentation layer services such as Innovative Interfaces' Encore, Ex Libris's Primo, AquaBrowser, and Endeca. Features include the ability to browse, dynamically refine results, and get more information about an item at the record level. Yet there is a limit to the depth of browsing and refining that can be achieved as it is only possible to expose the subject headings, links, and other data in the MARC records. How might Linked Data bring catalogs new ways to discover and access collections? Would this lead to a more dynamic and flexible catalog?

Over the past few years, catalogs have been enhanced with subscription-based services such as those from Syndetic Solutions and LibraryThing for Libraries that provide links to tables of contents, book covers, and reviews. But this approach to presenting information is still largely at item level. The data is imported into the catalog or linked to individual records. Records don't yet provide topic-based information (for example, more information about “cheese”) or details about a place or author. There may be plenty of additional information available on a topic or author, but if it's not linked to that particular holding, it is not visible to the user.

VuFind, an open source presentation layer, has begun to build this type of information into the catalog. In a good first step, the software creates a page for every author represented in the catalog and integrates content from Wikipedia. In the future, perhaps such information would be findable beyond the author field and from the topic and subject of a work, giving a broader picture of the contents.

The next step is to explode the catalog beyond the library. Linked Data sources can allow libraries to link out to a much wider range of information, allowing a better sense of “aboutness”—the places, people, and information that a resource describes, not just information about the resource itself. Increasing the range of information available not only helps people decide if they want to use the book, DVD, or journal but also helps them explore beyond the library, using your resources as their starting point. If it seems like this is taking users away from library content, it's not—libraries should provide their own resources as Linked Data so that links come back to the library. If the basic principle of Linked Data is that every instance of RDF should include a link, why shouldn't that go back to a library resource?

National-scale Linked Data

The developers of LIBRIS, the Swedish union catalog, at the Kungliga Biblioteket note that many catalogs are providing links to more information about items but not in a machine-readable way. LIBRIS uses RDF and URIs to include links to both its own resources and external information. Content negotiation returns human-readable information in a browser to users and machine-readable information to bots and spiders. Some libraries have manually added links to their resources and digital collections to individual entries in Wikipedia. The Royal Library of Sweden has taken this to the next level by creating links to DBpedia, the Linked Data representation of Wikipedia.

The Library of Congress (LC) has said it will soon make available authority data starting with Library of Congress Subject Headings (LCSH) as Linked Data following an earlier prototype by Ed Summers at lcsh.info in 2008. In addition to being able to create links (as LIBRIS successfully linked to lcsh.info), it will be possible to download vocabularies for further reuse. LC sees several potential benefits from using Linked Data, such as decreasing server loads and serving as an example of a best practice for other libraries.

Some libraries are debating whether or not to include links to web sites in their catalogs, since the inclusion of free resources that a library neither purchased nor subscribes to can complicate resource management. Increasingly, however, to stay relevant to our users, we need to provide access to all high-quality resources, not just those to which we subscribe. Just as we can and should provide static links to digital cultural and memory resources from other institutions in our catalogs, we should also extend the use of this data by linking to contextual information about these resources.

Standards and protocols

Some of the protocols and standards we use are limited by being designed for and used almost solely in the library community. We can share and exchange data among ourselves easily but not so easily with other cultural organizations, memory organizations, researchers, and the wider world. Yet, there are some very good protocols we have developed that should be used more broadly. The concept of Linked Data is derived from basic principles that are not specifically tied to any one community and is flexible enough to allow building up from this foundation with the best protocols, ontologies, and vocabularies used in libraries and other domains. OAI-PMH, OAI-ORE, and Dublin Core can and should be used in creating Linked Data, and indeed there are recommendations on expressing Dublin Core as RDF.

Software that a growing number of libraries use is becoming Linked Data–ready. Drupal, an open source content management system (CMS), is working to make its software increasingly Semantic Web–friendly by enabling it to create and expose contents as RDF. As the use of CMS increasingly extends beyond pure content management to powering catalogs (including SOPAC; see Link List) and creating links between repositories and digital libraries, RDF capabilities will become increasingly important and powerful.

Data curation

The growing interest in exchange and reuse of data, from extending experiments in science to mashups of geographic information, means we need to be able to identify and exchange data with others more easily than ever. Data can be reused and repurposed in ways never before thought possible as computing power increases.

With more academic and research libraries moving into data curation, librarians are starting to take on roles in helping researchers manage their data when it is created. As stewards, we curate this data and make it available to the community. Such data is then indexed to make discoverability easier. For example, the Digital Enterprise Research Institute created Sindice, which indexes more than 20 million documents that are Semantic Web–friendly (using RDF and/or microformats) in subjects from people and places to scientific data sets such as bibliographic data for computer science journals and protein sequences. [For more on microformats see “Microformats: Inline Context” by Karen Coombs, p. 64.]

Europeana, an online portal to digital collections from Europe's national libraries, emphasizes interoperability and uses SKOS (Simple Knowledge Organization System, a W3C standard for using RDF to represent controlled vocabularies such as LCSH and MeSH) along with several other metadata schemas. It is currently developing a semantic search version of the digital collection portal that matches search terms to location, name, titles, and concepts to present more meaningful results.

As convergence increases and libraries take on larger curatorial roles, libraries are working more closely with museums and memory institutions. An example of the use of Linked Data by a museum is CultureSampo, a Semantic Web portal for Finnish cultural memory.

Research and scholarship

Academic libraries are increasingly involved with the publishing process and scholarship at a number of different levels. For instance, liaison librarians assist researchers to find citations of their work to determine impact. Also, librarians are developing repositories to store, manage, and enable reuse of published and unpublished research materials. Repositories, like other digital library collections, need to be interoperable not only with other repositories but, more importantly, with other types of services.

Discovering research that is highly relevant has always been difficult, but the growth of full-text databases has in some cases made finding the best results even more complex. Scholars need to be able to follow the path of a research publication back to its origins, to inspect the original data and find out more about the author.

Moving forward, they need to see how a work has subsequently been cited and reused. Oxford University is building a service to share research, make discovery easier, and improve the accuracy of its data. Projects such as Oxford's that use Linked Data make it possible to connect disparate pieces of research in order to find out more about who wrote the studies and how they were funded and to allow users to download the original research data.

Deep dashboards

Linked Data may help to extend benchmarking within and across all types of libraries in ways too time-consuming to consider fully now. The University of Huddersfield in the UK took the bold step of sharing its patron data last year under an Open Data Commons license, giving other libraries the ability to download an anonymized set of information about who borrowed what and when. Comparing this type of data across libraries could be a powerful way to view reading trends in different regions, assess collection development strategies, and identify opportunities for resource sharing beyond existing networks.

Linked Data also helps to enable the principle of write once, reuse many. Libraries already reuse catalog data for myriad purposes—to help generate listings of new content, to showcase materials in subject guides, to create topic listings, and to recommend resources for a topic or course. Repurposing this data manually can be tedious. Linked Data is structured, enabling more rapid reuse.

The next step

Linked Data offers libraries an opportunity to benchmark data, broaden the information available to users for discovery, and share resources. Libraries have a great deal of unique and valuable information that can be shared, from authority records to digital collections on local history to major research and scholarly publications. Making this data available for linking increases the number of pathways back to the library. All it takes is working together.


Link List
Code4Lib FOAF inkdroid.org/c4l2009/attendees
CultureSampo kulttuurisampo.fi
DBpedia dbpedia.org
Dublin Core as RDF dublincore.org/documents/dc-rdf
Europeana europeana.eu
Library of Congress Subject Headings as Linked Data id.loc.gov
Linked Open Data sets esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
National Library of Sweden libris.kb.se
Oxford University Research Information Infrastructure brii.ouls.ox.ac.uk
Sindice sindice.com
SOPAC 2.0 thesocialopac.net
University of Huddersfield data library.hud.ac.uk/data/usagedata
W3C: Cool URIs Don't Change w3.org/Provider/Style/URI
W3C: Four Principles for Creating Linked Data w3.org/DesignIssues/LinkedData.html


Author Information
Fiona Bradley (fiona@semanticlibrary.net) is Research and Policy Officer, University of Technology Sydney Library, Australia

 

The fine print: preservation

Do URIs need to be guaranteed? Do they need to be real, HTTP-accessible resources? While large cultural organizations have an interest in ensuring their resources continue to be available, not everyone who publishes Linked Data will necessarily be mindful of preservation. Who will take on the role of providing persistent URIs for library data—will this task be left to individual libraries, or will a larger body take on this role?

Dan Chudnov, at a 2009 preconference at Code4Lib on Linked Data, noted that less stable data sets could disappear tomorrow, breaking all the links and frustrating users. He argues that it is important to cache Linked Data by identifying when one resource is the same as another to build redundancy into links so that breaks are less likely to happen. Libraries should use the best of preservation metadata standards and naming guidelines to create URIs that are sensible and sustainable. As the W3C notes, Cool URIs don't change:

“What makes a cool URI? A cool URI is one that does not change. What sorts of URIs change? URIs don't change: people change them.“

Talkback

We would love your feedback!

Post a comment

» VIEW ALL TALKBACK THREADS

Related Content

Related Content

 

By This Author

There are no other articles written by this author.

Sponsored Links




 
Advertisement
Sponsored Links

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

  • Design Institute 2007
    December 11, 2007 at Chicago's Harold Washington Library Center:Design Institute 2007
  • Learning Gardens
    New York's GreenBranches program links the library to the street.
  • Green Picks: LBD May 2007
    Want to reduce your library's carbon footprint? Join the Cradle-to-Cradle revolution. Helen Milling shares the green products her firm is using.
Advertisements





LJ NEWSLETTERS

Click on a title below to learn more.

LJ BookSmack
LJXPRESS
LJ ACADEMIC NEWSWIRE
LJ REVIEW ALERT
LJ Criticas Review Alert
©2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites