Advances in HTR technology make handwritten documents even more accessible and discoverable

Scanning historical documents and making them available to scholars in digital format holds great promise for increasing the accessibility of primary-source materials. But until now, researchers have faced key limitations in accessing handwritten letters, manuscripts, and other materials online.

Handwritten Text Recognition (HTR) with Transcription could transform the research process

Scanning historical documents and making them available to scholars in digital format holds great promise for increasing the accessibility of primary-source materials. But until now, researchers have faced key limitations in accessing handwritten letters, manuscripts, and other materials online.

While optical character recognition (OCR) technology allows librarians and archivists to scan and search text that is printed or typed, cursive writing and other handwritten texts have proven to be more of a challenge. Although these materials can be digitally scanned, researchers have had to rely on the accompanying metadata when searching for specific information, or else painstakingly read through the documents page by page to find what they’re looking for if no typewritten transcript exists.

New advancements in technology are changing that. For instance, handwritten text recognition (HTR) technology is getting better at identifying handwritten characters with accuracy—and Quartex, a cloud-based digital collections solution from Adam Matthew, has integrated HTR into its platform. This groundbreaking development makes digitized letters, manuscripts, and other handwritten materials fully keyword searchable, saving an enormous amount of time for researchers.

Now, Quartex has extended this functionality with a new automated transcription feature as well. The enhancement of HTR with Transcription means that librarians and archivists can now generate a fully editable and searchable transcript of each handwritten asset with a single click. This makes primary sources even more easily discoverable, especially for researchers who rely on transcriptions to support screen reader technology.

“It’s really an invaluable resource to be able to search the full text of handwritten materials, rather than spending hours poring over page after page of manuscripts,” says Jacquelyn Sundberg, who oversees outreach and special projects for the Rare and Special Collections, Osler, Art, and Archives (ROAAr) group at McGill University in Montreal.



Opening new avenues for research

At McGill, Sundberg and her colleagues are using the Quartex platform to digitize and promote the university’s extensive collection of documents related to the fur trade business in and around Montreal. These documents include handwritten letters, invoices, and accounting ledgers from leaders of the fur trade business, including the university’s founder, James McGill.

“Before now, you could look at these documents if you came in and consulted our archives,” Sundberg says. There is a finding aid for each of the folders that make up the collection. However, researchers would still have to search through pages of documents in each folder to find what they’re looking for.

“What we have now is a giant leap forward,” she says. All of the files have been digitized and are available in one place online. All of the documents have been scanned using OCR for typewritten text and HTR for handwritten text.

Although the character recognition rate is not quite 100 percent, “it’s an impressive number,” Sundberg says. “What we have now is a workable search tool. If you search for a person or place name, you’re going to get a very good rate of return on those keyword phrases across the whole collection.”

If someone types a search term into the platform, such as “Northwest Company,” all of the documents that contain that phrase would appear. Researchers aren’t limited to the collection’s descriptors or metadata when searching for information.

The technology gives researchers “a more comprehensive look into the documents,” she observes. “This creates new avenues for research that might not have been discovered using traditional techniques. It opens up a lot of doors.”



A ‘game changer’ for researchers

Baylor University in Texas is using HTR with Transcription to enhance its Armstrong Browning Collection, which includes extensive correspondence to and from the Victorian poets Elizabeth Barrett and Robert Browning. Darryl Stuhr, director of digitization and digital preservation services at Baylor, calls the technology a “game changer” for researchers.

The Armstrong Browning Collection actually consists of separate collections, including the Browning Letters, which have already been transcribed by a Browning scholar; the Victorian Letters, a collection of more than 3,300 other letters from the Victorian era, some of which have been transcribed by graduate students; and the Browning Manuscripts, which contains handwritten manuscripts of the Brownings’ works.

Baylor is using HTR with Transcription to generate automatic transcripts for the items that need these. “We’re hoping the system can also correct some of the mistakes our graduate students might have made in transcribing,” Stuhr says.

Assembling the Armstrong Browning Collection has been a collaborative process. Baylor initially embarked on the project with Wellesley College, which owned 500 love letters the Brownings wrote to each other. Since then, Baylor has also digitized letters and manuscripts owned by the University of Texas, Texas A&M, the Ohio State University, and other institutions—and it’s hoping to add materials from the Bodleian Library at Oxford and Eton College as well.

“Nobody’s going to give up the letters in their archives,” Stuhr says. “Really, the only way for all these items to come together is in a digital collections management system like Quartex, which gives us the ability to present a more comprehensive collection. Twenty years ago, scholars had to travel from library to library to access these materials. This platform makes them much more widely available. Using HTR with Transcription to provide a complete set of transcriptions of every handwritten document across the collection should also make their contents more discoverable, to the benefit of our research community around the world.”





Comment Policy:
  • Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
  • Don't use obscene, profane, or vulgar language.
  • Stay on point. Comments that stray from the topic at hand may be deleted.
  • Comments may be republished in print, online, or other forms of media.
  • If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.



We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing