Digital Curation
Version 1.3

Teleconference with Jason Scott

Excerpt playing: (choose from the outline below)

Part I


0:00:00 Introduction to the program and Jason Scott (Jon Ippolito)

Scott's backstory

02:47 "I literally stumbled backwards into becoming a librarian and archivist"

12:30 Dealing with entropy

16:04 Working at the Internet Archive (and fan-made subtitles)

"When they asked me for my job title, I said Free Range Archivist."

21:11 Being proactive with the press and preservation

Crowdsourcing preservation with Archive Team

22:34: Archive Team: "We think of ourselves as EMTs."

Why don't users expect the same property rights for deleted data that they would for a towed car?

"Every day the Archive Team is needed represents a failure on the part of organizations."

26:34 Rescuing content from dying platforms

Internet Archive currently acquires 35-75 TB of new data daily, with total holdings of 60-80 PetaBytes.

"The record [for shortest warning about a shutdown] was 'We closed seven days ago.'

"The longest record is Apple Mobile Me's one-year announcement, which turned out to be kind of bad because if you tell somebody they've got to be out of their house in one year, and then never talk to them about it again, that is a very surprising day one year from now."

"We are floating in a miasma of made-up, ad hoc rules, and the question is, Is it better to have some agreed-upon data-retention/data-curation rules in this world than what we're doing right now?"

Becoming a "real" archivist

32:41: At what point did you start to consider yourself an archivist? (Beth Sanders)

When Scott was sued in 2000, he asked, "Why are they going after me for this, I'm a librarian for God's sake."

"Metadata is a love note to the future."

"Between when I'm 30 and when I'm 40, I'm thinking of myself as a keeper of content, to keep it for the real people. And from 40 to now, I think of myself as the real people that I was keeping it for."

35:34 Taxonomic structure of the Internet Archive


"Librarians are dealing with living people, and archivists are dealing with people who are not born yet."

37:59 Working with Brewster Kahle

Kahle didn't want to bureaucratize Scott.

Who decides what to do with 100.000 Wojak memes?

After the Archive launched a collection of emulated arcade games, 5 million users tried it over the weekend.

"I have certainly crashed the Internet Archive on multiple occasions."

How the Internet Archive works

41:43 What is the long-term strategy for the Internet Archive's existence? (Lorraine Scott)

"I have a shipping container of disks."

45:29 How does the Archive handle intellectual property rights of uploaded material? (Beth Sanders)

"Ask for probation, not permission."

The Internet Archive is currently banned in Russia and China.

51:32 When material arrives, do you see matchmaking opportunities for documentaries? (John Bell)

Demo of command-line tools for searching the Internet Archive

5000 Hungarian supermarket circulars

"I have personally uploaded over a million items to the archive and I have moved three or 4 million around."

Part II

Managing crowdsourced contributions

00:00 Is material acquired by Archive staff more important, or what's uploaded by the "little guys"? (Renee DesRoberts)

"A scary amount of these open places are closing's getting harder and harder to upload to Dropbox, Mega, Google Drive....Everyone starts falling back to what's the easiest place and we are rapidly becoming the easiest place. They will abuse it, they will destroy the Commons and we will react....We'll keep it open as long as we can."

04:25 Managing oversharers with "Stacks"

Running obsolete software with emulation

10:21 How to play vintage games without downloading anything

"These connections to software history are as real and as valid as photographic film."

Emulated platforms include MS DOS, Apple ][, ZX Spectrum, Atari, Windows 3.1 (including 28 versions of Solitaire).

17:58 Emulating Flash animations and games

Peanut Butter Jelly Time, Lobster Magnet, Drum Machine, Home Star Runner!

Emularity matches media types to an emulator, if it exists. Ruffle is the emulator for Flash.

25:31 Do you have to create a new emulator when you acquire a new format? (Renee DesRoberts)

"MAME emulates 32,000 things, of which maybe 9,000 are unique."

Did you know the Internet Archive emulates handheld devices (like the Speak 'N Spell)?

"The Internet archive does not make the emulators, does not make the compiler, and doesn't make the browser. We just pushed them all together."

"Schools use [emulated content]....Every 8-12 seconds, somebody boots up Oregon Trail on the Archive."

35:40 How can we support the creation of more emulators?

"You put out these deposits of goodwill and interaction with people...that's all people stuff, there's no computer in there."

48:20 Is there ever a time when the Archive is not the best repository for something? (Beth Sanders)

"I often define us as the last step before trash."

The Digital Preservation Bogeyman

55:06 How long does it take for the Archive to digitize analog material? (Blair Mueller)

1:02:37 Who does the digitization? (Beth Sanders)

"If you really honestly want to do an at-scale process, you should, and some would argue, you have to, think about compensation....So I treat volunteers like solid gold, and I respect that they have real lives."

1:10:25 Do you and the Archive work with groups in other countries? (Thomas Walskaar)


1:12:55 How to contact Jason

Coda 1: "The greatest show in New York City."

Coda 2: "The stupidest thing I own."

Coda 3: Jason's Zoom setup.

This teleconference is a project of the University of Maine's Digital Curation program. For more information, contact ude.eniam@otiloppij.

Timecodes are in (hours:) minutes: seconds

In this interactive discussion hosted by the University of Maine's Digital Curation program on May 5th, 2021, the Internet Archive's Jason Scott explains his unconventional but remarkably successful approach to archiving the world's digital culture.

Jason Scott has been called "the figurehead of the digital archiving world." Apart from his day job at the Internet Archive, Scott's fingerprints can be found on some of most promising initiatives in digital preservation today, from spearheading the effort to emulate vintage platforms in the browser to his boldly proactive approach to rescuing vast swathes of digital culture from oblivion. Although he’s a leader in emulation and open access, he may be best known as the charismatic frontman for Archive Team, a crowdsourcing initiative that has saved more of digital culture for posterity than most of the world's museums and libraries combined.

Watch the entire video or choose an excerpt from the menu on this page.