Prompt:
“Write a single sentence summarizing why cryptographically-signed provenance information matters in the context of AI-generated content.”
ChatGPT:
“Cryptographically-signed provenance information matters in the context of AI-generated content to establish trust, transparency, and authenticity by providing a verifiable record of the content’s origin, authorship, and creation process.”
That’s a great point, but did ChatGPT really say that, or did I make that up to trick you into reading this article? I could show you a screenshot of that exchange to try to convince you …
… but what good would it do, when it’s so easy to produce convincing fakes?
There is no one-size-fits-all when it comes to web archiving techniques, and the variety of tools and services available to capture web content illustrate the wide, ever-growing set of needs in the web archiving community. As these needs evolve, so do the web and the underlying challenges and opportunities that capturing it presents. Our decade of experience running Perma.cc has given our team a vantage point to identify emerging challenges in witnessing the web that we believe extend well beyond our core mission of preserving citations in the legal record. In an effort to expand the utility of our own service and contribute to the wider array of core tools in the web archiving community, we’ve been working on a handful of Perma Tools.
In this blog post, we’ll go over the driving principles and architectural decisions we’ve made while designing the first major release from this series: Scoop, a high-fidelity, browser-based, single-page web archiving capture engine for witnessing the web. As with many of these tools, Scoop is built for general use but represents our particular stance, cultivated while working with legal scholars, US courts, and journalists to preserve their citations. Namely, we prioritize their needs for specificity, accuracy, and security. These are qualities we believe are important to a wide range of people interested in standing up their own web archiving system. As such, Scoop is an open-source project which can be deployed as a standalone building block, hopefully lowering a barrier to entry for web archiving.
We are excited to announce the release of our “reading mode” - a new casebook view that offers students a cohesive digital format to facilitate deep reading.
We think better design of digital reading environments can capture the benefits of dynamic online books while orienting readers to an experience that encourages deeper analysis. Pairing that vision with our finding that more students are seeking digital reading options, we identified an opportunity to develop a digital reading experience that is streamlined, centralized, and most likely to encourage deep reading.
During this talk, I presented what we’ve learned building thread-keeper, the experimental open-source software behind social.perma.cc which allows for making high-fidelity captures of twitter.com urls as “sealed” PDFs.
Chatbots such as OpenAI’s ChatGPT are becoming impressively
good at understanding complex requests in “natural” language and generating convincing blocks of
text in response, using the vast quantity of information the models they run were trained on.
Garnering massive amounts of mainstream attention and rapidly making its way through the second
phase of the Gartner Hype Cycle, ChatGPT and its
potential amazes and fascinates as much as it bewilders and worries.
In particular, more and more people seem concerned by its propensity to make “cheating” both easier to do and harder to detect.
My work at LIL focuses on web archiving technology, and the tool we’ve created,
perma.cc, is relied on to maintain the integrity of web-based
citations in court opinions, news articles, and other trusted documents.
Since web archives are sometimes used as proof that a website looked a certain way at a certain time,
I started to wonder what AI-assisted “cheating” would look like in the context of web archiving.
After all, WARC files are mostly made of text:
are ChatGPT and the like able to generate convincing “fake” web archives?
Do they know enough about the history of web technologies and the WARC format to generate credible artifacts?
I’ve been asking
ChatGPT to write some
poems. I’m doing this because it’s a great way to ask ChatGPT how it
feels about stuff — and doing that is a great way to understand
all the secret layers that go into a ChatGPT output. After looking
at where ChatGPT’s opinions come from, I’ll argue that
secrecy is a problem for this kind of model, because it overweighs the
risk that we’ll misuse this tool over the risk that we won’t understand
what we’re doing in the first place.
I started with the idea of a computer-generated story in which audience participation creates copies of the narrative from each person’s point of view. The story would evolve in real time as new users joined and each person’s copy would update accordingly. I called the story Forks. After some initial trials, I decided not to launch it because I did not believe the project was securable against harm.
The framing plot was this: A person begins a journey to a new home. They forge a trail through the landscape that subsequent travelers can follow. Each person has a set of randomly-assigned traits: a profession, a place where they live, a time of day when they join the procession, and a type of item they leave behind for others to follow their path. They may directly follow the original traveler or a later traveler. At the end of the story, the total number of travelers are described as having arrived at their new home.
The Harvard Library Innovation Lab is delighted to welcome Molly White as an academic fellow.
Molly has become a leading critic on cryptocurrency and web3 issues through her blog Web3 is Going Just Great as well as her long-form research and essays, talks, guest lectures, social media, and interviews. Her rigorous, independent, and accessible work has had a substantial impact on legislators, media, and the public debate.
This summer, one of our research assistants, Seonghee Lee, ran a study among current law students that is helping us reconsider some longstanding assumptions about student reading preferences and informing future development of the H2O Open Casebook platform.
Historically, the complexities of the backend infrastructures needed to play back and embed rich web archives on a website have limited how we explore, showcase and tell stories with the archived web. Client-side playback is an exciting emerging technology that lifts a lot of those restraints.
The replayweb.page software suite developed by our long-time partner Webrecorder is, to date, the most advanced client-side playback technology available, allowing for the easy embedding of rich web archive playbacks on a website without the need for a complex backend infrastructure. However, entirely delegating to the browser the responsibility of downloading, parsing, and restituting web archives also means transferring new responsibilities to the client, which comes with its own set of challenges.
In this post, we’ll reflect on our experience deploying replayweb.page on Perma.cc and provide general security, performance and practical recommendations on how to embed web archives on a website using client-side playback.