Ithaka Big Data Report Enlists Librarian Cohorts, Provides Professional Development

The latest report from Ithaka S+R, “Big Data Infrastructure at the Crossroads,” released December 1, offers critical findings and recommendations on the ways higher ed researchers, scholars, and technicians can partner with university and college librarians to support data research. The report was built from quantitative results and interview transcripts produced by a cohort of librarians at each participating institution.

Ithaka S+R logoThe latest report from Ithaka S+R, “Big Data Infrastructure at the Crossroads,” released December 1, offers critical findings and recommendations on the ways higher ed researchers, scholars, and technicians can partner with university and college librarians to support data research. The report was built from quantitative results and interview transcripts produced by a cohort of librarians at each participating institution.

The report looks at the growing use of big data—large, complex data sets—in research across disciplines and fields, concentrating primarily on how academic libraries experience methodologies, workflows, and outputs and the challenges they currently face. Ithaka partnered with librarians from 23 U.S. colleges and universities who conducted more than 200 interviews with faculty and other stakeholders. The interviews provided Ithaka with insights into the process of data curation, enabling the report’s authors to define several areas to explore. As each university prepared to assemble its cohort and interview colleagues, and during the survey process, participating librarians received multiple rounds of training that not only enabled them to conduct interviews and assemble the material they collected, but will prove useful going forward.

Despite the time-intensive commitment, they were enthusiastic about participating. “Interviews were each approximately an hour long, and the process [included] recruiting interview subjects, arranging interviews, conducting interviews, seeing the transcripts created and proofed, and then coding the transcripts and writing a report,” noted Dylan Ruediger, qualitative analyst with Ithaka S+R’s Libraries, Scholarly Communication, and Museums program and the report’s lead author. However, the information provided by Ithaka generated a good deal of interest. “My sense is that it’s part of why a lot of university librarians are interested in pursuing these projects with us, partially because of the topics that we’re exploring are ones that are of great interest and importance to them, but also because they represent important professional development opportunities for staff.”

Ithaka S+R has been studying big data use in higher education for a couple of years, beginning with a review of data services offered at campuses around the country, which helped boost awareness of how data-intensive research is expanding into all fields across academia.

Working with big data is a growing concern in academic publishing and research, and is often managed from within the library, or with the library as an integral partner. Libraries and librarians are increasingly being called on to help researchers access data, or work with and manage the data that they generate in the course of research.

As the use and sharing of big data increases, the infrastructure necessary to support it has struggled to keep up. The report noted several areas of focus: the tension and interplay between disciplinary and interdisciplinary perspectives, managing complex data, structures for collaboration, sharing knowledge, ethical challenges, and support and training.

The most important findings fall into several areas, said Ruediger. The first—”in some ways this is an obvious one, but it’s a really important one,” he noted—is that big data is extremely resource-intensive. The costs necessary to build an infrastructure that will support personnel, technology, and research at scale require sizeable budget allocations. Costs will probably increase as data research and machine learning methods become more widespread. “That’s likely to be a considerable challenge for institutions to be able to sustain going forward,” he noted.

Researchers often work with quantities of data so large that they can’t maintain it themselves, a key opportunity for libraries to step in. “Over and over again, researchers identified managing their data—whether that’s version control, effective metadata, or naming conventions—as challenges that they really struggled with,” Ruediger said. In recent years academic libraries have invested heavily in staff to help support researchers, and the report’s findings suggest that there is still more work to be done.

Findings also demonstrate that the legacy of campus decentralization, and the increased siloing of academic departments, continues to hamper information sharing and knowledge circulation. Different disciplines often had widely disparate access to the resources needed to conduct necessary research.

“Big Data Research, almost by definition, is interdisciplinary and collaborative,” said Ruediger. “A lot of researchers pointed to the library as a potential place where either physical spaces, or programming and other kinds of networking opportunities that would allow researchers from different disciplinary backgrounds to engage with each other, might be possible, because the library’s a shared overlapping point that crosses a lot of those disciplinary boundaries.”

The report included recommendations for various stakeholders across campus in addition to libraries, such as research offices, academic departments, funders, and scholarly societies. Recommendations for libraries included developing expertise in metadata creation, data curation, and data management, as well as data analytics and visualization; creating and updating curated guides to datasets and purchasing subscription datasets will help save researchers time and money. Networking and marketing the libraries’ data management and storage capacities—hosting forums, seminars, symposiums, and other opportunities for researchers to share across disciplines—will help increase visibility. Workshops can be tailored to students working in big data–focused labs and researchers from fields that are less experienced in technical, programming, or quantitative skills. When feasible, libraries can expand one-on-one consultation services or offer on-demand workshops for specific research groups.



Ithaka originally put out a call for libraries interested in exploring the question of how best to support data research, who then assembled a team of librarians to speak with researchers on their campus. Each team picked its own interview subjects—mainly librarians of different types, said Ruediger, but some brought in staff working at their institutions’ high-performance computing centers or other colleagues involved in the research process.

Each cohort gathered for an initial kickoff meeting to discuss the project and its scope, as well as questions they had about the topic. Participants were given input into a shared semi-structured interview guide put together by Ithaka S+R, and trained in how to effectively conduct interviews to get the best results. Ithaka also provided advice on how to recruit interview subjects and move through the Institutional Review Board (IRB) process to gain approval. After the interviews wrapped, librarians were given instruction on coding the qualitative aspects for analysis. Ithaka offered guidance and feedback once they submitted their draft reports.

In addition, once Ithaka’s report published, cohorts were engaged again to discuss the project and its findings. Participating librarians were given pointers on how to engage with stakeholders on campus and ideas for how to frame the report’s recommendations to help drive policy changes on their campuses. “We’re hoping that these projects, in addition to producing new knowledge, will also produce action on important issues,” said Ruediger.

Although the process was relatively smooth, said Ruediger, the methodologies had been created before the pandemic hit, which necessitated some changes. “This project was initially designed with the expectation that much of the research would take place in person, and like most things over the last year and a half, have had to be switched to different modalities,” he told LJ. “That caused some challenges. On top of that, everybody’s experiencing new kinds of stress, and new demands on their time.”

Involving librarians so closely in the research process, and coaching them on how to advocate for change on their campuses, is an important component of Ithaka’s research, said Ruediger. “The interviews suggest that many researchers are not aware of how much the library can offer in this regard. Many researchers thought of libraries as places to get a book or an article, and not as a place where they could turn to for help with things like data management or curation,” he explained. “Libraries still have some work to do in getting the word out about the investments they’ve made and are likely to be well served by continuing to reup on those investments.”

The 23 U.S. college and university libraries that Ithaka S+R partnered with on this report include Atlanta University Center Consortium (Clark Atlanta University, Morehouse College, and Spelman College); Boston University; Carnegie Mellon University; Case Western Reserve University; Georgia State University; New York University; North Carolina A&T State University; North Carolina State University; Northeastern University; Pennsylvania State University; Temple University; Texas A&M University, College Station; University of California, Berkeley; University of California, San Diego; University of Colorado Boulder; University of Illinois, Urbana-Champaign; University of Massachusetts, Amherst; University of Oklahoma; University of Rochester; University of Virginia; and University of Wisconsin, Madison.

Author Image
Lisa Peet

Lisa Peet is Executive Editor for Library Journal.

Comment Policy:
  • Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
  • Don't use obscene, profane, or vulgar language.
  • Stay on point. Comments that stray from the topic at hand may be deleted.
  • Comments may be republished in print, online, or other forms of media.
  • If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.



We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing