Digital Curation
Brumfields version 1.0

Sara and Ben Brumfield, Combining human and artificial intelligence in today's archives

Excerpt playing: (choose from the outline below)

Introduction

00:00 Jon Ippolito introduces the program

02:29 The IMPACT RISK framework of AI downsides

03:18 Why Sara and Ben got into AI

"I looked at Ben and said, we've got to get on top of this AI stuff or we're going to be out of business in ten years."

Ten Ways AI Will Change Archives

05:20 1. Improving Accessibility

"ChatGPT did as well as the staff at the Folger [Shakespeare Library]"

08:22 2. Extracting Entities

Identifying proper names and places.

09:10 3. Matching Entities

"We think this is a responsible and useful use of AI."

11:19 AI benchmarks

How can we defeat the inbred nature of benchmarks?

12:20 Recognizing the same name

How do you know the John Bell on page 2 is the same John Bell on page 49?

18:43 4. Describing Items

"Managing AI-generated derivatives is going to be a massive challenge for anyone in digital libraries and archives."

20:19 5. Improving Discovery

"We use the metaphor of a metal detector. You're not going to dig up the whole beach looking for gold, and the metal detector is not going to only find a golden ring. If you get luck you'll find some cool stuff...[regardless] you don't have to dig up the whole beach."

27:53 Tricks for avoiding false positives?

30:03 Multimodal discovery

Can you query images with text prompts?

31:05 6. Interacting with Archives

LLMs generate errors that are "seductively plausible" while custom-trained transcription software makes obvious errors. Can combining the methods make plain the questionable transcriptions?

34:06 7/8. Creating Text from Handwriting + Enhancing "Dirty" Text

41:42 AI interfaces

"A lot of the challenge here isn't AI research, it's the interface research. How do we devise these interfaces that can work for humans."

42:21 Flagging potential mistakes

Can you build a system aware of its own uncertainties?

43:48: Training your own model

45:16: Leveraging chain-of-thought

When the model's guardrails change "news on the cotton market" to "news on the farm" to avoid potentially racist connotations, "you have a strong tendency to whitewash history if you pass it through these things."

49:53 9. Transcribing AV Material

AI and human video transcription had comparable error rates, but students finished in days while AI finished in minutes.

52:26 10. Changing Archival Workflows

Bonus provocations

53:25 Visualizing historical persons from descriptions

56:19 How to manage AI derivatives

The danger of AI generations replacing real data

A data provenance ontology: ProvO [external link]

This teleconference is a project of the University of Maine's Digital Curation program. For more information, contact ude.eniam@otiloppij.

Timecodes are in minutes: seconds

How can archivists and others who manage digital collections leverage AI ethically and effectively?

Sara and Ben BrumfieldOn 2 April 2025 UMaine's Digital Curation graduate program organized a conversation with a pair of researchers on the cutting edge of applying generative AI to solve problems in digital curation.

Every year UMaine's Digital Curation program holds a teleconference with leaders in the field. This time guests Sara and Ben Brumfield shared pragmatic insights about applying generative AI for archival tasks. From recognizing handwriting and adding metadata to photos to interacting with and improving access to archives, Sara and Ben share the good, the bad, and the ugly results of their experiments. Also discussed is the role of human-in-the-loop crowdsourcing in getting the most out of AI while minimizing its failure points.

FromThePage, the Brumfield's crowdsourcing platform, has helped institutions like Harvard, Stanford, and the National Archives to crowdsource transcription of historical documents. Backgrounds spanning computer science, gender studies, and linguistics have enabled the Brumfields to explore a diverse range of groundbreaking projects, from transcribing 20th-century American archives to translating Aztec codices. Despite being stewards of a commercial product, Sara and Ben have contributed a number of open resources to the archival community, including webinars on applying AI to cultural heritage.

Watch the entire video or choose an excerpt from the menu on this page.

Or view more teleconferences from the Digital Curation program.