Lawyers use search algorithms on a daily or even hourly basis, but the way they work often remains mysterious. Users receive pages and pages of results from searches that ostensibly are based on some relevancy standard, seemingly guaranteeing that the most important results are all found. But that may not always be the case. This post explores the mystery of search algorithms from a legal research perspective. It examines what is wrong with algorithms being mysterious, explores our current knowledge of how they work, and makes recommendations for the future.
Prompt:
“Write a single sentence summarizing why cryptographically-signed provenance information matters in the context of AI-generated content.”
ChatGPT:
“Cryptographically-signed provenance information matters in the context of AI-generated content to establish trust, transparency, and authenticity by providing a verifiable record of the content’s origin, authorship, and creation process.”
That’s a great point, but did ChatGPT really say that, or did I make that up to trick you into reading this article? I could show you a screenshot of that exchange to try to convince you …
… but what good would it do, when it’s so easy to produce convincing fakes?
There is no one-size-fits-all when it comes to web archiving techniques, and the variety of tools and services available to capture web content illustrate the wide, ever-growing set of needs in the web archiving community. As these needs evolve, so do the web and the underlying challenges and opportunities that capturing it presents. Our decade of experience running Perma.cc has given our team a vantage point to identify emerging challenges in witnessing the web that we believe extend well beyond our core mission of preserving citations in the legal record. In an effort to expand the utility of our own service and contribute to the wider array of core tools in the web archiving community, we’ve been working on a handful of Perma Tools.
In this blog post, we’ll go over the driving principles and architectural decisions we’ve made while designing the first major release from this series: Scoop, a high-fidelity, browser-based, single-page web archiving capture engine for witnessing the web. As with many of these tools, Scoop is built for general use but represents our particular stance, cultivated while working with legal scholars, US courts, and journalists to preserve their citations. Namely, we prioritize their needs for specificity, accuracy, and security. These are qualities we believe are important to a wide range of people interested in standing up their own web archiving system. As such, Scoop is an open-source project which can be deployed as a standalone building block, hopefully lowering a barrier to entry for web archiving.
We are excited to announce the release of our “reading mode” - a new casebook view that offers students a cohesive digital format to facilitate deep reading.
We think better design of digital reading environments can capture the benefits of dynamic online books while orienting readers to an experience that encourages deeper analysis. Pairing that vision with our finding that more students are seeking digital reading options, we identified an opportunity to develop a digital reading experience that is streamlined, centralized, and most likely to encourage deep reading.
During this talk, I presented what we’ve learned building thread-keeper, the experimental open-source software behind social.perma.cc which allows for making high-fidelity captures of twitter.com urls as “sealed” PDFs.
Chatbots such as OpenAI’s ChatGPT are becoming impressively
good at understanding complex requests in “natural” language and generating convincing blocks of
text in response, using the vast quantity of information the models they run were trained on.
Garnering massive amounts of mainstream attention and rapidly making its way through the second
phase of the Gartner Hype Cycle, ChatGPT and its
potential amazes and fascinates as much as it bewilders and worries.
In particular, more and more people seem concerned by its propensity to make “cheating” both easier to do and harder to detect.
My work at LIL focuses on web archiving technology, and the tool we’ve created,
perma.cc, is relied on to maintain the integrity of web-based
citations in court opinions, news articles, and other trusted documents.
Since web archives are sometimes used as proof that a website looked a certain way at a certain time,
I started to wonder what AI-assisted “cheating” would look like in the context of web archiving.
After all, WARC files are mostly made of text:
are ChatGPT and the like able to generate convincing “fake” web archives?
Do they know enough about the history of web technologies and the WARC format to generate credible artifacts?
I’ve been asking
ChatGPT to write some
poems. I’m doing this because it’s a great way to ask ChatGPT how it
feels about stuff — and doing that is a great way to understand
all the secret layers that go into a ChatGPT output. After looking
at where ChatGPT’s opinions come from, I’ll argue that
secrecy is a problem for this kind of model, because it overweighs the
risk that we’ll misuse this tool over the risk that we won’t understand
what we’re doing in the first place.
I started with the idea of a computer-generated story in which audience participation creates copies of the narrative from each person’s point of view. The story would evolve in real time as new users joined and each person’s copy would update accordingly. I called the story Forks. After some initial trials, I decided not to launch it because I did not believe the project was securable against harm.
The framing plot was this: A person begins a journey to a new home. They forge a trail through the landscape that subsequent travelers can follow. Each person has a set of randomly-assigned traits: a profession, a place where they live, a time of day when they join the procession, and a type of item they leave behind for others to follow their path. They may directly follow the original traveler or a later traveler. At the end of the story, the total number of travelers are described as having arrived at their new home.
The Harvard Library Innovation Lab is delighted to welcome Molly White as an academic fellow.
Molly has become a leading critic on cryptocurrency and web3 issues through her blog Web3 is Going Just Great as well as her long-form research and essays, talks, guest lectures, social media, and interviews. Her rigorous, independent, and accessible work has had a substantial impact on legislators, media, and the public debate.
This summer, one of our research assistants, Seonghee Lee, ran a study among current law students that is helping us reconsider some longstanding assumptions about student reading preferences and informing future development of the H2O Open Casebook platform.