Linked Data is only as useful as the metadata on which it depends, and poor quality metadata ultimately causes the challenges many librarians hope to address with Linked Data.
Used successfully for many years in industry, Linked Data appeals to librarians for its potential to improve services. It allows libraries to describe resources more richly than before, leverage expertise and data across the Web, expose local resources, and add new capabilities to the discovery process. It’s therefore not surprising that librarians have increasingly been demanding support for Linked Data in integrated library systems, repository software, and library standards.
However, Linked Data is only as useful as the metadata on which it depends, and poor quality metadata ultimately causes the challenges many librarians hope to address with Linked Data. Given that the resources necessary to create and maintain the access points, vocabularies, and relationships Linked Data needs to function are unlikely to emerge, the potential for Linked Data to benefit library services is limited.
Needlessly obtuse jargon makes Linked Data appear more complicated than it is. According to linkeddata.org, Linked Data is "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." This definition is virtually useless because Linked Data, the Semantic Web, URIs, and RDF are so interrelated that understanding any of the terms requires understanding the other three. Other authoritative sites also offer nebulous definitions laden with technical jargon that can only be understood by those who already know what Linked Data is.
At its core, Linked Data is a way to describe things and relationships between things using Web addresses that serve as identifiers. This means that instead of storing a name, subject heading, location, or other data point, a Linked Data based system stores a Web address where information about those things can be retrieved. Different data points can be maintained by different entities. For example, names and geographic locations might be maintained by different organizations, and the data retrieved could contain other identifiers—for instance, the place where a person works can be stored as a Web-based identifier where information about that can be retrieved.
Linked Data’s system of distributed identifiers allows exploration and expression of much more complex relationships than can be achieved using other methods. It also simplifies certain maintenance and user functions. Whenever information associated with an identifier changes, systems that store that identifier automatically are updated. Likewise, all attributes and relationships associated with an identifier are immediately accessible.
Linked Data is a powerful tool, but only for problems that have technical origins. As the term implies, Linked Data depends on data. Metadata needs consistent and complete access points. Ontologies and vocabularies need to be comprehensive and well-maintained. Systems need to know what to do with the data they retrieve.
None of those requirements is met for general library use, nor is there reason to expect they will be. For years, libraries have reduced the number of staff dedicated to creating metadata while increasing their dependence on metadata supplied by publishers or downloaded from bibliographic utilities. The resources to maintain necessary vocabularies and ontologies is a small fraction of what Linked Data needs. No system can interpret the meaning of all the MARC fields, and the trend has been continuing normalization (i.e. simplification) of MARC data because patrons and staff alike demand simplicity and configuration is already too complex. Linked Data is orders of magnitude more complex than MARC.
Linked Data is appropriate for limited domains that can be described using well-maintained vocabularies and ontologies. For example, drugs, interactions, and evidence supporting observations can all be classed and related in multiple dimensions, and the Micromedex drug interaction database uses Linked Data to identify potential interactions and side effects. Without Linked Data, this would not be possible.
At its core, we don’t know what we want to do with Linked Data, and our vision of success often revolves more around the mere act of using it rather than doing anything useful with it. The excitement surrounding Linked Data is reminiscent of what often happens when new technologies are introduced, namely people redefine their needs to accommodate a tool. When microwave ovens first became mainstream, people took classes where they learned to bake cakes and whole turkeys in microwaves. When new pharmaceuticals are announced, many people pressure their doctors for prescriptions even if existing drugs meet their needs.
Linked Data is often presented as a general solution for metadata problems, but it’s only truly useful in certain situations. Like an antibiotic, it’s a powerful tool when used appropriately but ineffective or even detrimental when misused. And just as patients who pressure doctors for prescriptions without understanding the implications often receive harmful or useless treatments, the pressure to incorporate Linked Data in systems where it provides questionable benefits works against core library objectives such as supporting discovery and preservation of materials.
Kyle Banerjee is Collections and Services Technology Librarian and Associate Professor, Oregon Health and Science University.
Right on and to the point. I couldn't agree more.
Thank you Kyle. I've been making the same case for years, most notably in three widely shared papers from 2017 ("BIBFRAME as Empty Vessel," "Roadmap to Nowhere: BIBFLOW, BIBFRAME, and Linked Data for Libraries," and "Zombrary Apocalypse! : RDA, LRM, and the Death of Cataloging"). See also my video, "Life after MARC?" (https://youtu.be/CqmlSRSGDdo).
A good introduction to an article on The Myth of Linked Data. When might we see the body?
This was meant as a short blip aimed a broad audience to stimulate conversation.
A longer article needs to be tuned to the readers and their background -- what managers, public services, technical services, and systems staff relate to is very different. Also, it needs to examine a much more specific problem such as implementing it in shared environments where records come from many sources, digital collections, etc.
One thing I didn't say in this one because I didn't want it to sound like a technical services article is that libraries have used Linked Data for a long time. At its essence, Linked Data is authority control. Someone has to figure out which identifier to use for an access point (i.e. authorized heading), the records associated with identifiers represent need to be maintained and related to each other (i.e. vocabulary). Authorized fields in MARC already contain unique identifiers/entries as well as indicators indicating which vocabulary is used.
Unfortunately, libraries that don't have time/resources to verify access points and maintain vocabularies necessary for authority control still won't when we rebrand that process as Linked Data.
I agree with many of your points. But Linked Data is not totally useless for libraries. It notably allows - or stimulate - the distributed creation of metadata, for example by sharing a common authority file for all German regions. In that sense, imagine all German speaking librarians contributing to a unique record describing a person: there are significant mutualisation benefits, and the descriptions are going to be more complete/detailed. Connecting them through VIAF, you can then have a multilingual service offering authority data once in German, once in French, once in English...
This use case might correspond more to Europe than to the USA, but still it is an end user use case.
Dear Sir,
I liked the article very much. I was looking for something simple and in plain English for a change and focuses on my work (Head of Cataloguing and Metadata at the Lebanese American University. I wrote to the Journal because I did not remark an email of yours. I want to explore with you and ask for the permission to translate your article into the Arabic language. Thank you.
Thank you Jeff. I read all of those. I will reread them with another perspective again. Bughdana
Nicolas Prongué: you are of course correct, and I think your characterization of linked data as “not totally useless” is apt. After all, the Web *is* linked data; it’s simply not linked as rigorously as some advocates of Linked Data for libraries (now with capital letters) would like. The example you cite is telling, because it is an extension of what libraries have been doing for decades: cooperatively building and maintaining authority files (the Library of Congress Authority File being the prime example in the United States), which often include multilingual aspects (e.g. “see” references pointing from “Набоков, Владимир Владимирович, 1899-1977” and “納布可夫, 1899-1977” to the “authorized” form “Nabokov, Vladimir Vladimirovich, 1899-1977”). Where arguments for linked data break down is in their feasibility as applied to bibliographic (rather than authorities) metadata. The Library of Congress’s BIBFRAME project is a case in point. BIBFRAME has not (and in my view, will not) replace MARC any time soon, for a number of infrastructure reasons too complicated to expand upon here.
We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing