Open Data, Open Minds Highlight SPARC Digital Repository Meeting

By David Rapp

The 2010 SPARC (Scholarly Publishing & Academic Resources Coalition) Digital Repositories Meeting this week managed to combine serious talk about the challenges of open data, examining both successes and failures, with an efficient and informative showcase, the Innovation Fair, featuring rapid-fire presentations of tools and services from institutional repositories (IR).

Held in Baltimore, MD, November 8 and 9, the meeting brought speakers and attendees, from around the world to discuss topics facing digital archives—from repository-based publishing strategies, to open data initiatives in the United States, Canada, and the UK, to ideas about global repository networks and financial sustainability for archives.

Collaborative openness
Keynoter Michael Nielsen, a quantum computation pioneer and science blogger, has recently turned his full attention to issues surrounding scientific collaboration, with a strong focus on open data. He began by comparing a blog post by mathematician Tim Gowers on the concept of crowdsourced collaboration to solve mathematics problems with similar calls for collaboration by Linux developer Linus Torvalds and Wikipedia developer Larry Sanger.

The Gowers post led to a collaborative effort called the Polymath Project, Nielsen noted, as well as a published paper in Nature (and a series of mathematical papers, as Nielsen comments below). Although impressive, Nielsen was quick to point out that blogging "is not the future of scientific publishing," but said that open, online collaboration could lead to more powerful tools for research.

But there are obstacles: he ran down a list of failed collaborative projects over the last few years, specifically noting Nature's 2006 open peer review experiment, which led to relatively little participation from researchers. The reluctance, Nielsen said, likely stems from the idea that that many researchers see no upside to the deal: why share research that's not complete, which might help competitors in the same field, and which will yield no academic credit or reputational reward? The academic culture seems geared against openness; no one wants to "drive on the other side of the road," as Nielsen put it. But, he noted, when all researchers in a given field agree to share data, as with human-genome research, that bias can change. Nielsen's focus on collaboration set the tone for the remainder of the conference.

Repository-based publishing and open data worldwide
In a panel on repository-based publishing, Wendy Robertson, digital resources librarian at the University of Iowa, and Mark Newton, digital collections librarian at Purdue University Libraries, discussed their respective experiences using their digital repositories as publishing platforms for academic work. Ventura Pérez, assistant professor of bioarcheology at the University of Massachusetts Amherst, spoke about the challenges of starting up an open-access journal in his very specific field of study. Both speakers spoke of the difficulties of trying to do new things in a system unused to change-a situation that many tech-minded librarians have surely faced in their own careers.

But the next speaker brought a different perspective, showing that success can also come by keeping things to as few people as possible. Nathan MacBrien, the publications director at the Institute of International Studies at the University of California, Berkeley, talked about his unusual initiative: open-access book publishing done via library, university press, and campus collaboration. The academic ebook monograph market is small—only five to eight ebooks per year are put out through the Global, Area, and International Archive (GAIA) publications program—but the overhead is small, as MacBrien is the only person on staff.

A panel on open data had a distinct international flavor, featuring Kevin Ashley, director of the UK-based Digital Curation Centre, as well as Charles Humphrey, head of data research services at the University of Alberta.

Ashley said that while many many UK institutions are willing to tackle research data management, they are faced with a "critical skills gap" that requires retraining of existing staff. He also talked about the recent interest in exploring more cloud-based data options. Humphrey noted that in Canada, research data is handled by a variety of different, mostly unconnected entities, and described a possible collaborative framework in which different entities could share responsibility for data stewardship functions.

Gail Steinhart, research data and environmental sciences librarian at the Albert R. Mann Library at Cornell University, discussed the challenges of small research data sets, and specifically DataStaR, a data staging repository which allows researchers to create metadata for such data sets and share them.

Two-minute warnings
In what might have been the highlight of the meeting, to close the first day, 21 researchers gave two-minute, one-PowerPoint-slide informative presentations about services and tech tools they were developing for their IRs. The presenters were chosen via a competitive selection process by the meeting's program committee. Two examples give a sense of the flavor of range of initiatives highlighted at the game-show-like event:

Sue Kunda, the digital production librarian at Oregon State University, described a partnership between their archive and the university advancement department that made it easier for the latter to provide access to original research for university news stories. The library contacts publishers to find out which version of a scholarly paper, if any, can be placed in the IR. The library then contacts the author to obtain that version, places it in the IR, and provides a direct URL that university advancement can use in a news story; the page where the archived paper resides can include a link to the news story, as well.

Kirsta Stapelfeldt of the University of Prince Edward Island provided a tech-based example, reporting on the open source Islandora project, a framework that uses Drupal and Fedora applications to create a customizable digital asset management system.

"Hell hath no fury"
George Strawn, the director of the national coordination office of the U.S.-based Federal Networking and Information Technology Research and Development Program, gave an entertaining overview of the history of disruptive technologies and attitudes toward openness and copyright. When discussing publishers' use of copyright as its "weapon of choice," Strawn quoted a friend in what was the biggest laugh-line of the conference: "Hell hath no fury like a vested interest masquerading as a moral principle."

Global repository networks, financial sustainability
The international tone continued with the second day's opening panel. Neil Jacobs, acting program director for the UK's Joint Information Systems Committee (JISC), leading a panel entitled "Global Repository Network?" asked a very basic question: why bother connecting repositories worldwide? One key reason, all the panelists seemed to agree: linked, openly available information allows researchers to make new inferences about disparate data—and inference, Jacobs pointed out, is what scientific research is at its core.

Jun Adachi, director of cyberscience infrastructure development at Japan's National Institute of Informatics, revealed the debut of a new nationwide digital library consortium in Japan, though he acknowledged that technical issues from country to country abound when speaking of a global repository. Clifford Lynch, the director of the Coalition for Networked Information, sounded a similar caution, also noting that political difficulties could arise in different countries when dealing with information about human subjects.

Martha Giraldo Jaramillio, executive director of RENATA in Colombia, reported on wide support for the growing number of Latin American initiatives for open access repositories; one project, RedClara, aims to connect research and education networks in different countries and connect them to European and North American networks as well.

As with any extended discussion of ambitious IR projects, there remains a key question that must be explored: how much is this going to cost? In a panel on financial sustainability, three professionals showed the different ways they manage to make their projects more cost-effective. Sue Kriegsman, archivist and librarian at Harvard University Libraries, said that her Digital Access to Scholarship at Harvard (DASH) team spent relatively little time on its interface and "front-end": "What I care about is making sure the content is available," she said, adding that 60 percent of DASH's traffic was from offsite, via Google Scholar.

Oya Rieger, associate university librarian for digital scholarship at Cornell University, addressed issues surrounding arXiv, Cornell's influential archive of more than 600,000 electronic preprints of scientific papers. In early 2010, arXiv asked for voluntary monetary contributions from institutions that made the most use of the archive (as Cornell students and faculty are only a tiny fraction of its users). She reported that 95 institutions have since contributed $319,000, a sign of the archive's popularity. That amount isn't quite enough to fully sustain arXiv—according to Rieger, arXiv's annual budget is about $380,000—it unquestionably takes a lot of the cost burden off of Cornell's shoulders.

David Palmer, the scholarly communications team leader at Hong Kong University Libraries, discussed the successes of the university's grant-funded HKU Scholars Hub, and the library's ongoing project to get its researchers to clean up bibliographic data on resources such Google Scholar and Scopus, in order to use such freely available information for its own project. The Hub, he explained, then helps generate exposure for the university, by, for example, being a useful "expert finder" for the media on a wide range of topics.

