IIPC Technical Speaker Series: Archiving Twitter

I was invited by the International Internet Preservation Consortium (IIPC) to give a webinar on the topic of “Archiving Twitter” on January 12.

During this talk, I presented what we’ve learned building thread-keeper, the experimental open-source software behind social.perma.cc which allows for making high-fidelity captures of twitter.com urls as “sealed” PDFs.

Read more of "IIPC Technical Speaker Series: Archiving Twitter"

Towards “deep fake” web archives? Trying to forge WARC files using ChatGPT.

Chatbots such as OpenAI’s ChatGPT are becoming impressively good at understanding complex requests in “natural” language and generating convincing blocks of text in response, using the vast quantity of information the models they run were trained on.
Garnering massive amounts of mainstream attention and rapidly making its way through the second phase of the Gartner Hype Cycle, ChatGPT and its potential amazes and fascinates as much as it bewilders and worries. In particular, more and more people seem concerned by its propensity to make “cheating” both easier to do and harder to detect.

My work at LIL focuses on web archiving technology, and the tool we’ve created, perma.cc, is relied on to maintain the integrity of web-based citations in court opinions, news articles, and other trusted documents.
Since web archives are sometimes used as proof that a website looked a certain way at a certain time, I started to wonder what AI-assisted “cheating” would look like in the context of web archiving. After all, WARC files are mostly made of text: are ChatGPT and the like able to generate convincing “fake” web archives? Do they know enough about the history of web technologies and the WARC format to generate credible artifacts?

Let’s ask ChatGPT to find out.

Read more of "Towards "deep fake" web archives? Trying to forge WARC files using ChatGPT."

ChatGPT: Poems and Secrets

I’ve been asking ChatGPT to write some poems. I’m doing this because it’s a great way to ask ChatGPT how it feels about stuff — and doing that is a great way to understand all the secret layers that go into a ChatGPT output. After looking at where ChatGPT’s opinions come from, I’ll argue that secrecy is a problem for this kind of model, because it overweighs the risk that we’ll misuse this tool over the risk that we won’t understand what we’re doing in the first place.

Read more of "ChatGPT: Poems and Secrets"

Ethical Collaborative Storytelling

I started with the idea of a computer-generated story in which audience participation creates copies of the narrative from each person’s point of view. The story would evolve in real time as new users joined and each person’s copy would update accordingly. I called the story Forks. After some initial trials, I decided not to launch it because I did not believe the project was securable against harm.

The framing plot was this: A person begins a journey to a new home. They forge a trail through the landscape that subsequent travelers can follow. Each person has a set of randomly-assigned traits: a profession, a place where they live, a time of day when they join the procession, and a type of item they leave behind for others to follow their path. They may directly follow the original traveler or a later traveler. At the end of the story, the total number of travelers are described as having arrived at their new home.

A generated story would look something like this:

Read more of "Ethical Collaborative Storytelling"

Welcome Molly White: Library Innovation Lab Fellow

The Harvard Library Innovation Lab is delighted to welcome Molly White as an academic fellow.

Molly has become a leading critic on cryptocurrency and web3 issues through her blog Web3 is Going Just Great as well as her long-form research and essays, talks, guest lectures, social media, and interviews. Her rigorous, independent, and accessible work has had a substantial impact on legislators, media, and the public debate.

Read more of "Welcome Molly White: Library Innovation Lab Fellow"

Web Archiving: Opportunities and Challenges of Client-Side Playback

Historically, the complexities of the backend infrastructures needed to play back and embed rich web archives on a website have limited how we explore, showcase and tell stories with the archived web. Client-side playback is an exciting emerging technology that lifts a lot of those restraints.

The replayweb.page software suite developed by our long-time partner Webrecorder is, to date, the most advanced client-side playback technology available, allowing for the easy embedding of rich web archive playbacks on a website without the need for a complex backend infrastructure. However, entirely delegating to the browser the responsibility of downloading, parsing, and restituting web archives also means transferring new responsibilities to the client, which comes with its own set of challenges.

In this post, we’ll reflect on our experience deploying replayweb.page on Perma.cc and provide general security, performance and practical recommendations on how to embed web archives on a website using client-side playback.

Read more of "Web Archiving: Opportunities and Challenges of Client-Side Playback"

2021 Research Associates

Like most things at LIL, our visiting researcher program has taken many forms over the years. This year, despite our team being spread across the East and Midwest Coasts (shout out to Lake Michigan) we were thrilled to welcome five research associates to the virtual LILsphere, to explore their interests through the lens of our projects and mission.

Read more of "2021 Research Associates"