2021 Research Associates

Like most things at LIL, our visiting researcher program has taken many forms over the years. This year, despite our team being spread across the East and Midwest Coasts (shout out to Lake Michigan) we were thrilled to welcome five research associates to the virtual LILsphere, to explore their interests through the lens of our projects and mission.

In addition to joining us for our daily morning standups, RAs attended project meetings and brainstorming sessions, and had access to all of the resources the Harvard Library system has to offer. Their individual research was based on questions they had or ideas they wanted to explore in the realm of each of our three tentpole projects: the Caselaw Access Project, H2O, and Perma.cc.

Each of our visitors tackled an exceptionally interesting corner of our work; some helped propel us forward in terms of platform functionality, others prompted us to reconsider some of our base assumptions around our users. They produced things from new software features to teaching materials, design briefs, and research documentation. Below are brief descriptions of their work and links to their individual outputs.

Rachel Auslander

Using technology to empower research and information access is a central tenet of the LIL mission. Another value we have as a group is that of collaboration. This summer, Rachel explored what it would mean to be able to fuse external datasets into CAP via metadata in a way that would bring context and texture to caselaw.

Her design brief which will guide future LILers to integrate these ideas into the CAP interface can be viewed here.

Ashley Fan

We got double the fun from Ashley this summer! Initially, she was interested in working on collections of caselaw that would empower journalists on various beats to apply a legal lens to their writing. Using a new feature available from CAP Labs, Ashley put together a series of Chronolawgic timelines for three different beats: education, health, and environment.

You can read her post about all of these timelines and find links to them here.

Then, in true LIL fashion, Ashley found herself swept up in an interesting problem that happened to come up during her time with the team. The power of the CAP dataset is that it makes accessing caselaw exponentially easier, but caselaw, by nature, can contain sensitive content about individuals involved in specific cases. This tension often manifests itself in requests by those individuals to remove their information from our database of cases, and Ashley jumped in alongside our team to research and formalize a process for decision-making and action.

Follow this link to learn more about this question, and Ashley's research.

Andy Gu

The scope of possibilities surrounding the Caselaw Access Project is so vast, we're really just starting to see how it can change the way scholars look at and study the law. This summer, Andy worked to create further flexibility in our built-in visualization features and expand users' ability to explore trends, particularly in relation to an extremely important aspect of the law: inter-case citation.

In a series of blog posts, Andy sets out how he extended the Trends tool using the Cases endpoint of the API; a powerful application of a new feature; and the design work that was done to integrate these upgrades into the general search interface of CAP.

Adaeze Ibeanu

Undergraduate curricula were the focus of Adaeze's summer. Where and how is the law taught to students who aren't explicitly attending law school? Via a thorough survey of undergraduate curricula and conversations with students, Adaeze presented our team with a summary of legal teaching in an undergraduate setting, and took a deeper dive into legal teaching in the social and natural science fields. Her research explored the potential impact of legal texts and open educational resources in completely new settings.

Aadi Kulkarni

Since 2018, our team has been integrating primary legal documents, including caselaw and the U.S. Code, directly into H2O, our open casebook platform, to make the creation of legal teaching materials even more seamless and powerful. This summer, Aadi continued that work by exploring ways in which H2O could include state code in a casebook—extending content capabilities for all of our users. Along the way, Aadi learned a lot about open-source communities and the process of integrating public materials into our platform.

If you're interested in our visiting research opportunities, make sure to follow us on Twitter. You should also feel free to reach out to us at lil@law.harvard.edu!

Interface Upgrade | Integrating Queries into Search and Case View

With expanded feature capabilities, users may find writing these queries to be more difficult, especially as researchers increase the complexity of their investigations. To make usage easier, we have integrated the Trends query language into the Search and Case View features. From a search query, users can click the Trends button, upon which our servers will automatically convert an existing query into a Trends timeline.

Gif showing search results converted into a Trend timeline.

Additionally, users can now view the citation history of a particular case from that case's page by clicking the "View citation history in trends" button.

Gif showing ability to display citation history on a Trend timeline from an individual case

Our exploration of timeline generation for empirical legal scholarship has inspired us to reimagine how people reason about CAP's corpus of American caselaw. In the future, we hope to restructure the search page further and empower people to quickly ask complex questions about American caselaw over time.

We believe that citation-based analysis can significantly enrich our understanding of American caselaw, and we are excited to see how these tools can expose insights both in the law itself and in quantitative techniques for its exploration. If you have any ideas for how we can further expand on these features, please do not hesitate to reach out to us at info@case.law.

This is part of a series of posts by Andy Gu, a visiting researcher who joined the LIL team in summer 2021. We were inspired to build these features after recognizing the power of the Caselaw Access Project's case and citation data to analyze and explore caselaw. We hope that these features will make empirical study of caselaw both faster and more accessible for researchers.

New Feature | Flexible Citation Queries

Expand your ability to visualize citation practices with the latest support added to our Trends tool. Trends now supports flexible queries of how cases cite other cases in addition to the other ways in which cases can be filtered. By appending the name of any acceptable filter parameter to cites_to__{parameter name here}, users can retrieve all cases citing to cases matching said filter. The parameter name, like before, can be any parameter accepted by the Cases API.

For instance, the following query graphs the number of cases that cite to another case where Justice Cardozo wrote the majority opinion against the number of cases where Justice Brandeis wrote the majority opinion.

comparison of majority opinion authors over time displayed on a graph
Figure 1 query: api(cites_to__author_type=cardozo:majority), api(cites_to__author_type=brandeis:majority)

The cites_to__ feature provides users the power to flexibly reason about case citation patterns. For instance, if a user were interested in how the Supreme Court of California cited authority from its own jurisdiction in comparison to authority from other jurisdictions, they could write the following query:

comparison of citations within jurisdiction versus outside displayed on a graph
Figure 2 query: api(court=cal-1&cites_to__jurisdiction__exclude=cal), api(court=cal-1&cites_to__jurisdiction=cal)

This set of parameters can be integrated with any other parameters compatible with the Cases API. For instance, we can filter the above timeline only to citations of cases that mention the term 'technology':

comparison of citations within jurisdiction versus outside filtered by topic displayed on a graph
Figure 3 query: api(court=cal-1&cites_to__jurisdiction__exclude=cal&cites_to__search=technology), api(court=cal-1&cites_to__jurisdiction=cal&cites_to__search=technology)

Users may also use the parameters within the api() tag to query the Cases API directly. A caveat to the cites_to__ feature is that if the number of cases that fulfill a cites_to__ condition is greater than 20,000 cases, our system will randomly select 20,000 cases within the filtered cases to match against. For more information about all the parameters we support, please feel free to consult our Cases API documentation here.

If you're interested in exploring this data in a different way, make sure you've checked out Cite Grid.

This is part of a series of posts by Andy Gu, a visiting researcher who joined the LIL team in summer 2021. We were inspired to build these features after recognizing the power of the Caselaw Access Project's case and citation data to analyze and explore caselaw. We hope that these features will make empirical study of caselaw both faster and more accessible for researchers.

Feature Update | Extension of Trend Search Capability

Today, we are announcing an update to the Caselaw Access Project (CAP) API and Trends tool to help users better investigate changes in the law over time. These new features enable users to easily generate timelines of cases and explore patterns in case citations. We hope that they can help researchers uncover new insights about American caselaw.

Previously, the project's Historical Trends tool permitted users to graph word and phrase frequencies in cases over time. For instance, the following graph displays the frequency of the terms 'lobster' and 'gold' over time in cases in Maine and California.

historical trends results displayed on graph
Figure 1 query: me: lobster, cal: gold

We have extended the Trends tool so that users can generate timelines of cases for any parameter accepted by the Cases API endpoint. As a result, users can ask broad questions about the Caselaw Access Project's dataset and quickly retrieve timelines of cases that follow the queried pattern.

For instance, the following query presents timelines of cases which cite Mapp v. Ohio since 1961, split by jurisdiction.

query results displayed on graph
Figure 2 query: *: api(cites_to=367 U.S. 643)

The breadth of available filters drastically increases the number of possibilities for a researcher to explore case data. For example, we can take the author parameter in the Cases API to graph the number of cases where Justice Scalia wrote a dissenting opinion with the number of cases where Justice Scalia wrote a majority opinion. By clicking into the timeline, users can retrieve granular information about the qualifying cases.

results filtered by author, displayed on a graph
Figure 3 query: api(author_type=scalia:dissent), api(author_type=scalia:majority)

The power of this flexible query language increases with each parameter supplied to the Trends query. If a user wanted to compare the frequency of Supreme Court cases where Justice Scalia dissented and Justice Breyer wrote the majority opinion with cases where Justice Breyer dissented and Justice Scalia wrote the majority opinion, they could draft the following search:

graphed results of specific opinion author queries
Figure 4 query: api(author_type=scalia:dissent&author_type=breyer:majority&court=us), api(author_type=scalia:majority&author_type=breyer:dissent&court=us)

We have also updated our underlying database to allow users to reason over the citation patterns of individual opinions, in addition to the case itself. If a user wanted to see how many times Justice Scalia specifically cited Mapp v. Ohio in an opinion, we can do so with the following query:

number of time a case was cited by a specific author over time, displayed on a graph
Figure 5 query: api(author__cites_to_id=1785580&author=scalia), api(author__cites_to_id=1785580&author=breyer)

We believe that these features will empower researchers to quickly conduct rich explorations of American caselaw, and we are excited to see how they can expose new insights about our corpus of cases. If you have any ideas for how we can further expand on these features, please do not hesitate to reach out to us at info@case.law.

This is part of a series of posts by Andy Gu, a visiting researcher who joined the LIL team in summer 2021. We were inspired to build these features after recognizing the power of the Caselaw Access Project's case and citation data to analyze and explore caselaw. We hope that these features will make empirical study of caselaw both faster and more accessible for researchers.

Download PDFs of Cases by Citation with CAP

Today we're announcing Fetch PDFs, a simple tool to find case citations in text and give you links to scanned PDFs of those cases from CAP.

Why is this helpful? Courts and law reviews often use print case reporters to confirm exact quotes from legal citations. For people who don't have print reporters — or don't have easy access to them from home — doing this kind of cite checking can be a challenge.

Fetch PDFs lets you extract case citations from your text and read scanned PDFs of those cases or download them all as a zip file. Our PDFs come from print case reporters from the collections of Harvard Law School Library.

Here's how it works! You can start by adding your own text or list of citations. We'll use a snippet from Miranda v. Arizona:

Screenshot of Fetch PDFs showing text box containing excerpt from Miranda v. Arizona.

Select "Find Citations" to show all cases cited in your text:

Screenshot of Fetch PDFs showing the list of cases cited in the excerpt, and the option to download PDFs of those cases as a zip file.

Click the case name of any case to read it as HTML, or click "PDF" to go right to the PDF. Click "Download Zip" to download all of the selected cases.

We want to hear from you! Do you have ideas, stories, or feedback about using CAP for cite checking, access to print reporters, and more? We're looking forward to your message.

Updates to Case Display - Headmatter

The Caselaw Access Project offers free, public access to over 6.5 million decisions published by state and federal courts throughout American history. Because our mission is providing access to legal information, we make these decisions available in a variety of formats through a variety of different access methods.

One of our most important ways of sharing cases is through the basic case display. If you come across a case on Google or anywhere else on the web and click on the link, you're likely to land on the case display.

We're constantly thinking about better ways to present the range of information we have about each case. One of our latest improvements has been to update how headmatter is shown so that information about a case (such as the list of attorneys or, for older cases, public domain headnotes or other supplemental content) is distinct from the actual text of the case.

Here's an example of what that looks like:

Case view of "Whaling v. Shales" (1839) showing headnotes and attorneys highlighted above the text of the opinion.

Our goal is to share as much information as we possibly can about each case. But we want to make sure that the information is clear and readers can easily navigate all the distinct elements of the case.

If you have other ideas for how we can improve case display, please reach out to us anytime at info@case.law.

Search Update: Download Search Results as Dataset

The Caselaw Access Project offers free, public access to over 6.5 million decisions published by state and federal courts throughout American history. We make these decisions available in a variety of formats through a variety of different access methods.

Court decisions obviously are documents that can be read and interpreted by people, but they're also data that can be processed and analyzed by machines. We try to reflect this principle by designing interfaces that are useful for people (such as our search interface and case viewer) and for programs (our API).

Connecting Human Interfaces with Machine Interfaces

One of our favorite things is connecting these two types of interfaces together so that people who may be accustomed to searching for and reading cases can also begin to understand the cases as structured data that can be processed by programs. So, for example, our human search interface has a "SHOW API CALL" link that will display and explain the URL that is used by our API to execute your search:

Search for the word "computer", with an arrow pointing to "show API call".

Search for the word "computer", showing the API call used to complete the search as a link, highlighted with a box.

If you put that URL into your browser, you'll see the search results that are returned by our API. Likewise, when we display a case for reading, we also give you a button to view the case as structured data using our API:

Viewing the case, "Apple Computer, Inc v. Franklin Computer Corp.", with an arrow pointing to the format option "API", highlighted with a box.

Here's what that structured data looks like:

Viewing the case  "Apple Computer, Inc v. Franklin Computer Corp." as structured data using the CAP API.

We do this to help demystify the tech that powers legal information services, so that we all can demand more of the providers we rely on and so we can experiment in building our own things. Eventually, we expect others will make their own interfaces to the data that we make available through the CAP API. So if you don't like any of the commercial interfaces, and you don't like our search interface or case view, we want you to be able to build and experiment with your own. At a minimum, we hope that people will demand more from their information service providers, especially those who charge for access to public information.

Creating Datasets Out of Search Results

A new way we're connecting human interfaces to court decisions "as data" is to make it easy to download search results as a stand-alone dataset. We've heard many requests for this feature from our research community, and we're excited to announce it today.

Search for the word "computer", for cases decided in Arkansas between January 1, 2000 and December 31, 2002, with an arrow pointing to the "download" option, highlighted with a box.

When you click this button, you can download your search results as a custom dataset in JSON or CSV.

Search for the word "computer", for cases decided in Arkansas between January 1, 2000 and December 31, 2002, showing download options.

Once you've downloaded the dataset, you can work with the cases in your own environment using your own tools and methods. Creating custom datasets is something that most legal information providers do not support at all, which is part of the reason that empirical analysis of law has been so difficult and time-consuming in the past. Law professors and others were forced to spend months (or longer!) compiling collections of cases. We hope to make that process much easier with this feature.

Please let us know how it goes!

New Updates to Search: Advanced Filters

The Caselaw Access Project offers free, public access to over 6.5 million decisions published by state and federal courts throughout American history. Because our mission is providing access to legal information, we make these decisions available in a variety of formats through a variety of different access methods.

One type of access we've been working hard on recently is our search interface, which you can get to at case.law/search. We've had basic search working for a while, and we're pleased to share our new advanced search filters.

Advanced filters work exactly as you'd expect. Start your search with keywords or phrases, and then use the filters to narrow down jurisdictions, courts, and dates. Say you're looking for Massachusetts cases from 1820 to 1840 that contain the word "whaling."

Search for cases that include the word "whaling" decided from 1820 to 1840 in Massachusetts, showing advanced filters for full-text search, date from, date to, case name abbreviation, docket number, reporter, jurisdiction, citation, and court.

You can also access the advanced filters from the search results screen, so that you can fine-tune your search if you're not happy with the initial results. Delete or modify any of the filters as you go, and sort the results chronologically or by relevance.

Search results for cases that include the word "whaling" decided from 1820 to 1840 in Massachusetts, showing filters on left.

There is a lot more we hope to do with search, but we hope you enjoy this improvement. If you have ideas of your own, please share them with us at info@case.law.

CAP is a project of the Library Innovation Lab at Harvard Law School Library. We make open source software that helps people access legal information, preserve web sources with Perma.cc, and create open educational resources with H2O.

Exploring Caselaw Interfaces

Courts and the legal publishers that serve them, by necessity, are creatures of habit. A case's fundamental structure hasn't changed much, whether published early in the 19th century or during the COVID pandemic. Even when publishers started taking their wares online, they didn't stray far from their well-worn model. In many ways, that's a good thing. I imagine legal research and writing would be much more arduous if fundamental case elements were as inconsistent as citation schema over the years.

But we think these cases have undiscovered uses beyond informing legal arguments. We know that NLP (Natural Language Processing) folks have already made use of the API and bulk download tools we built at http://case.law. Still, the most frequently accessed pages on our website are individual case pages from google visitors. What are their needs? Historical research? Family history? ... leisure? Even if the fundamental structure of a case is necessarily immutable, are there opportunities for novel interfaces to bring these works to new audiences?

Process

The first step I took was to assemble a list of actions that people perform on collections of things.

a hand-scribbled list of verbs

Among these ideas, I was most interested in enhancing people's ability to cut through the endless walls of text we serve up to find what they're looking for. This is a more cut-and-dried topic for an interface exploration, so I spent most of my time there.

I am also interested in humanizing the stories behind these cases through narrative. Too often, the technical analysis of these legal documents overshadows that they describe real events in real people's lives. Not only have the subjects of these cases often endured gruesome, traumatic events, but the trials themselves are often traumatic. While I only lightly touched on this direction here, I'd very much like to explore it in the future.

The Results

Topic Explorer

Topic Explorer is a simple idea based on data or a data interface that does not exist. What if you could find the number of cases that contain a specific word and then get a list of the most frequently used important words in those cases?

an inverted triangle cut into sections each with search terms and results

At that point, you could add that word to your search.

an inverted triangle cut into sections each with search terms and results

Or hide it to expose more words.

an inverted triangle cut into sections each with search terms and results

Exclude it from your search to go in a different direction.

an inverted triangle cut into sections each with search terms and results

Trace Topic

Though based on the same interest in exploring a topic, this approach is a bit different. The idea is that within a case, you could highlight a word and then see how frequently that word appears in cases that cite to the case you're reading and cases that cite to those cases. The idea is that you could drill down from that topic into different usages within related cases.

a picture of a document with one word highlighted, and a number of documents around it.

The color of the case represents the relevance of the term in that search, or whatever else you want it to be, really.

Clandestine Conversation

This completely different approach to digging into a specific topic involves trying to facilitate conversation among readers. Maybe someone could annotate a highlighted passage with an invitation to discuss it.

Enter the text: a screenshot of a portion of text with a bit highlighted, and a small box pointing to it in which some text is entered in an input field

Users see a symbol: a screenshot of a portion of text with a bit highligted, and a small "i" icon next to it

They click on it and get the invitation: a screenshot of a portion of text with a bit highlighted, and a small box pointing to it in which some text invites a user to converse about the highlighted text

Ratings and Reviews

Maybe people have feelings about cases best expressed through star ratings and reviews? Frankly, they probably don't, but it seemed like too familiar an idiom to ignore.

a screenshot of a caselaw viewing toolbar interface with a "Ratings and review" section added, like on an ecommerce site.

If you haven't had a chance to check out our trends viewer, I highly recommend you drop what you're doing and play for a little while. Like Google's Ngram viewer, it will tell you the frequency with which a word appears in cases over time. You can even split it up by jurisdiction! However, if you want to see how something trends in ALL jurisdictions, it's a little tough to read.

Rather than having all years and jurisdictions visible, I represented jurisdictions on a map and added a year scrubber control. You can get the precise numbers for that year from the list on the right.

a map of the united states on which the states are varying in opacity based on some data, a timeline above it, and a data table to the right

3D Timeline Explorer

Our developer Anastasia is working on a very cool legally-focused storytelling interface we call Timeline. Its users can create legally-focused timelines that include cases, important dates and events, and narrative. Inspired by some of the new proximity conferencing tools, such as gather.town, I designed an interface with which someone could explore one of these timelines in a 3D environment.

Users access different bits of media when moving their sprite over different hot spots on the timeline.

a 3d cartoon depiction of a hallway with a timeline on the floor, marked with various hot spot symbols for sounds, movies or articles

Since we are primarily a caselaw database, court cases would probably get special treatment. Each case could have a virtual courtroom with different hot spots for different participants in the process.

a 3d cartoon depiction of a courtroom marked with various hot spot symbols for sounds, movies or articles

Sound of an Opinion

Like Topic Explorer, Sound of an Opinion would require data we don't yet have. Using pre-made or algorithmically-created sound clips, we would convey the emotional tone and other measurable facets of an opinion based on text sentiment analysis. In my simplistic demo, I correlate positivity/negativity with instrumentation and scale, verb density with the drumline volume, and adjective density with the drumline complexity. The sound clips were created in Logic Pro X using Apple Loops and their algorithmic drum beat creator.

a screenshot of a sound tile board

Check out this live ProtoPie demo (that will not work in Safari.)

Next Steps

While few, if any of these ideas will be fully realized, unencumbered, blue-skies thinking is time well spent around here. We've already started investigating the feasibility of generating and serving sentiment analysis data through our API. Do any of these ideas excite you? Do you have any ideas of your own you think belong here? Reach out and let us know!

This Is Just Amazing

The other day, I noticed this on the side of the house.

Category 5 cable with broken jacket

That is near the bottom of the run of Cat 5 Ethernet cable I installed over twenty years ago, from the cable modem and router in the basement through a window frame, up the side of the house and into the third floor through another hole in a window frame. What I found amazing was not so much that the cable, neither shielded nor rated for the out-of-doors, had lasted so long in such an amateurish installation, but that all of our Zoom meetings for the last eight months had passed through these little wires.

The really amazing part, beyond the near-magic of all that audio and video flying through little twists of copper, is the depth of dependency: at each end of that cable is hardware that changes voltages on the wires, operating system drivers for interacting with the hardware, the networking stacks of the operating systems that offer network interfaces to software, the software itself, the systems of authentication and authorization that the software uses to permit or deny access—a cascade of protocols, standards, devices, programming languages, and codebases that become the (mostly) seamless experience of the discussion we have at ten each morning. Or, a moment later, the experience of confirming that the city has accepted the ballot I mailed.

Starry-eyed delight in an amazing machine is clearly not sufficient, with as good a view as we now have of the broken dream of a liberatory Internet. We have to have an acute awareness of the system accidents implicit in our tools and the societal technologies that are connected to them. I believe the delight is necessary, though—without it, I don't see how we can ever learn to treat computers as anything other than an apparatus of control. There's hope, if a grimy cable with a broken jacket can carry joy.