Guest Post: Do Elected and Appointed Judges Write Opinions Differently?

Unlike anywhere else in the world, most judges in the United States today are elected. But it hasn’t always been this way. Over the past two centuries, the American states have taken a variety of different paths, alternating through a variety of elective and appointive methods. Opponents of judicial elections charge that these institutions detract from judicial independence, harm the legitimacy of the judiciary, and put unqualified jurists on the bench; those who support judicial elections counter that, by publicly involving the American people in the process of judicial selection, judicial elections can enhance judicial legitimacy. To say this has been an intense debate of academic, political, and popular interest is an understatement.

Surprisingly little attention has been paid by scholars and policymakers to how these institutions affect legal development. Using the enormous dataset of state supreme court opinions CAP provides, we examined one small piece of this puzzle: whether opinions written by elected judges tend to be more well-grounded in law than those written by judges who will not stand for election. This is an important topic. Given the important role that the norm of stare decisis plays in the American legal system, opinions that cite many existing precedents are likely to be perceived as persuasive due to their extensive legal reasoning. More persuasive precedents, in turn, are more likely to be cited and increase a court’s policymaking influence among its sister courts.

State Courts’ Use of Citations Over American History

The CAP dataset provides a particularly rich opportunity to examine state courts’ usage of citations because we can see how citation practices vary as the United States slowly builds its own independent body of caselaw.

We study the 52 existing state courts of last resort, as well as their parent courts. For example, our dataset includes cases from the Tennessee Supreme Court as well as the Tennessee Supreme Court of Errors and Appeals, a court that was previously Tennessee’s court of last resort. We exclude the decisions of the colonial and territorial courts, as well as decisions from early courts that were populated by legislators, rather than judges.

The resulting dataset contains 1,702,404 cases from 77 courts of last resort. The three states with the greatest number of cases in the dataset are Louisiana (86,031), Pennsylvania (70,804), and Georgia (64,534). Generally, courts in highly populous states, such as Florida and Texas, tend to carry a higher caseload than those who govern less populous states, such as North and South Dakota.

To examine citation practices in state supreme courts, we first needed to extract citations from each state supreme court opinion. For this purpose, we utilize the LexNLP Python package released by LexPredict, a data-driven consulting and technology firm. In addition to parsing the citation (i.e. 1 Ill. 19), we also extract the report the opinion is published in and the court of the case cited (i.e. Illinois Supreme Court). Most state supreme court cases— about 68.7% of majority opinions greater than 100 words—cite another case. About one-third of cases cite between 1 and 5 other cases while about 5% of cases cite 25 or more other cases. The number of citations in an opinion trends upward with time, as Figure 1 shows.

plot of the average number of citations between the late 1700s and early 2000s, increasing exponentially from about 0 to about 15
Figure 1: The average number of citations in a state supreme court opinion since the American founding.

The number of citations in a case varies by state, as well. Some state courts tend to write opinions with a greater number of citations than other state courts. Figure 2 presents the proportion of opinions (with at least 100 words) in each state with at least three citations since 1950. States like Florida, New York, Louisiana, Oregon, and Michigan produce the greatest proportion of opinions with less than three citations. It may not be coincidence that Louisiana and New York are two of the highest caseload state courts in the country; judges with many cases on their dockets may be forced to publish opinions more quickly with less research and legal writing allocated to citing precedent. Conversely, cases with low caseloads like Montana and Wyoming produce the greatest proportion of cases with at least three citations. When judges have more time to craft an opinion, they produce opinions that are more well-grounded in existing precedent.

choropleth map of the United States
Figure 2: The proportion of state supreme court opinions citing at least three cases by state since 1950 (the two Texas and Oklahoma high courts are aggregated).

Explaining Differences in State Supreme Court Citation

We expected that the number of citations included in a state supreme court method would vary based on the method through which a state supreme court’s justices are retained. We use linear regression to model the median number of citations in a state-year as a function of selection method, caseload, partisan control of the state legislature, and general state expenditures. We restrict the time period for this analysis to the 1942-2010 period.

regression results with confidence intervals and coefficient estimates
Figure 3: Linear Regression results of the effects of judicial retention method on the average number of citations in a state supreme court opinion, including state and year fixed effects.

The results are shown in Figure 3. Compared to judges who face nonpartisan elections, judges who are appointed, face retention elections, and face partisan elections include more citations in their opinions. In appointed systems, the median opinion contains about 3 more citations (about three-fifths of a standard deviation shift) than in nonpartisan election systems. In retention election systems, the median opinion contains almost 5 more citations (about a full standard deviation shift in citations) than in nonpartisan election systems. Even in partisan election systems, the median opinion contains a little less than 3 more citations.

Some Conclusions

These differences represent the potential for drastic consequences for implementation and broader legal development based on the type of judicial selection method in a state. Because opinions with more citations tend, in turn, to be more likely to be cited in the future, the relationship we have uncovered between selection method and opinion quality suggests that judicial selection and retention methods have important downstream consequences for the relative influence of state supreme courts in American legal development. These consequences are important for policymakers to consider as they consider altering the methods by which their judges reach the bench.

CAP Code Share: Get Opinion Author

This month we're sharing new ways to start working with data from the Caselaw Access Project. This CAP code share from Anastasia Aizman shows us how to get opinion authors from cases with the CAP API and CourtListener: Get opinion author!

There are millions of court opinions that make up our legal history. With data, we can learn new things about individual opinions, who authored them, and how that activity influences the larger landscape. This code share reviews how to get started with the names of opinion authors.

This code finds opinion authors from cases using the CAP API and CourtListener. By forming a query to the CAP API and returning the cases from that query, this code connects keywords that match with individual opinion authors using the CourtListener API. The final output creates a data frame of those authors and related data in CourtListener. Nice 🙌

Have you created or adapted code for working with data from the Caselaw Access Project? Send it our way or add it to our shared repository.

We want to share new ways to start working with data from the Caselaw Access Project. Looking for code to start your next project? Try our examples repository and get started today.

Tutorial: Retrieve Cases by Citation with the CAP Case Browser

In this tutorial we’re going to learn how to retrieve a case by citation using the Caselaw Access Project's case browser.

The CAP case browser is a way to browse 6.7 million cases digitized from the collections of the Harvard Law School Library.

Retrieve Case by Citation: Brown v. Board of Education

  1. Find the citation of a case you want to retrieve. Let’s start with Brown v. Board of Education: Brown v. Board of Education, 347 U.S. 483 (1954).

  2. In the citation, find the case reporter, volume, and page: Brown v. Board of Education, 347 U.S. 483 (1954).

  3. We’re going to create our URL using this template:<reporter>/<volume>/<page>

  4. In the reporter, volume, and page fields, add the information for the case you want to retrieve. Your URL for Brown v. Board of Education, 347 U.S. 483 (1954) should look like this:

  5. Let’s try it out! Add the URL you’ve just created to your browser’s search bar, and press Enter.

You just retrieved a case by citation using the CAP case browser! Nice job. You can now read and share this case at this address:

This tutorial shares one way to retrieve a case by citation in the CAP case browser. Find and share your first case today!

Computational Support for Statutory Interpretation with Caselaw Access Project Data

This post is about a research paper (preprint) on sentence retrieval for statutory interpretation that we presented at the International Conference on Artificial Intelligence and Law (ICAIL 2019) held in June at Montreal, Canada. The paper describes some of our recent work on computational methods for statutory interpretation carried out at the University of Pittsburgh. The idea is to focus on vague statutory concepts and enable a program to retrieve sentences that explain the meaning of such concepts. The Library Innovation Lab's Caselaw Access Project (CAP) provides an ideal corpus of case law that is needed for such work.

Abstract rules in statutory provisions must account for diverse situations, even those not yet encountered. That is one reason why legislators use vague, open textured terms, abstract standards, principles, and values. When there are doubts about the meaning, interpretation of a provision may help to remove them. Interpretation involves an investigation of how the term has been referred to, explained, recharacterized, or applied in the past. While court decisions are an ideal source of sentences interpreting statutory terms, manually reviewing the sentences is labor intensive and many sentences are useless or redundant.

In our work we automate this process. Specifically, given a statutory provision, a user’s interest in the meaning of a concept from the provision, and a list of sentences, we rank more highly the sentences that elaborate upon the meaning of the concept, such as:

  • definitional sentences (e.g., a sentence that provides a test for when the concept applies).
  • sentences that state explicitly in a different way what the concept means or state what it does not mean.
  • sentences that provide an example, instance, or counterexample of the concept.
  • sentences that show how a court determines whether something is such an example, instance, or counterexample.

We downloaded the complete bulk data from the Caselaw Access Project. Altogether the data set comprises more than 6.7 million unique cases. We ingested the data set into an Elasticsearch instance. For the analysis of the textual fields we used the LemmaGen Analysis plugin which is a wrapper around a Java implementation of the LemmaGen project.

To support our experiments we indexed the documents at multiple levels of granularity. Specifically, the documents were indexed at the level of full cases, as well as segmented into the head matter and individual opinions (e.g., majority opinion, dissent, concurrence). This segmentation was performed by the Caselaw Access Project using a combination of human labor and automatic tools. We also used our U.S. case law sentence segmenter to segment each case into individual sentences and indexed those as well. Finally, we used the sentences to create paragraphs. We considered a line-break between two sentences as an indication of a paragraph boundary.

For our corpus we initially selected three terms from different provisions of the United States Code:

  1. independent economic value (18 U.S. Code § 1839(3)(B))
  2. identifying particular (5 U.S. Code § 552a(a)(4))
  3. common business purpose (29 U.S. Code § 203(r)(1))

For each term we have collected a set of sentences by extracting all the sentences mentioning the term from the court decisions retrieved from the Caselaw Access Project data. In total we assembled a small corpus of 4,635 sentences. Three human annotators classified the sentences into four categories according to their usefulness for the interpretation:

  1. high value - sentence intended to define or elaborate upon the meaning of the concept
  2. certain value - sentence that provides grounds to elaborate on the concept’s meaning
  3. potential value - sentence that provides additional information beyond what is known from the provision the concept comes from
  4. no value - no additional information over what is known from the provision

The complete data set including the annotation guidelines has been made publicly available.

We performed a detailed study on a number of retrieval methods. We confirmed that retrieving the sentences directly by measuring similarity between the query and a sentence yields mediocre results. Taking into account the contexts of sentences turned out to be the crucial step in improving the performance of the ranking. We observed that query expansion and novelty detection techniques are also able to capture information that could be used as an additional layer in a ranker’s decision. Based on the detailed error analysis we integrated the context-aware ranking methods with the components based on query expansion and novelty detection into a specialized framework for retrieval of case-law sentences for statutory interpretation. Evaluation of different implementations of the framework shows promising results (.725 for NDGC at 10, .662 at 100. Normalized Discounted Cumulative Gain is a measure of ranking quality.)

To provide an intuitive understanding of the performance of the best model we list the top five sentences retrieved for each of the three terms below. Finally, it is worth noting that for future we plan to significantly increase the size of the data set and the number of statutory terms.

Independent economic value

  1. [. . . ] testimony also supports the independent economic value element in that a manufacturer could [. . . ] be the first on the market [. . . ]
  2. [. . . ] the information about vendors and certification has independent economic value because it would be of use to a competitor [. . . ] as well as a manufacturer
  3. [. . . ] the designs had independent economic value [. . . ] because they would be of value to a competitor who could have used them to help secure the contract
  4. Plaintiffs have produced enough evidence to allow a jury to conclude that their alleged trade secrets have independent economic value.
  5. Defendants argue that the trade secrets have no independent economic value because Plaintiffs’ technology has not been "tested or proven."

Identifying particular

  1. In circumstances where duty titles pertain to one and only one individual [. . . ], duty titles may indeed be "identifying particulars" [. . . ]
  2. Appellant first relies on the plain language of the Privacy Act which states that a "record" is "any item . . . that contains [. . . ] identifying particular [. . . ]
  3. Here, the district court found that the duty titles were not numbers, symbols, or other identifying particulars.
  4. [. . . ] the Privacy Act [. . . ] does not protect documents that do not include identifying particulars.
  5. [. . . ] the duty titles in this case are not "identifying particulars" because they do not pertain to one and only one individual.

Common business purpose

  1. [. . . ] the fact of common ownership of the two businesses clearly is not sufficient to establish a common business purpose.
  2. Because the activities of the two businesses are not related and there is no common business purpose, the question of common control is not determinative.
  3. It is settled law that a profit motive alone will not justify the conclusion that even related activities are performed for a common business purpose.
  4. It is not believed that the simple objective of making a profit for stockholders can constitute a common business purpose [. . . ]
  5. [. . . ] factors such as unified operation, related activity, interdependency, and a centralization of ownership or control can all indicate a common business purpose.

In conclusion, we have conducted a systematic study of sentence retrieval from case law with the goal of supporting statutory interpretation. Based on a detailed error analysis of traditional methods, we proposed a specialized framework that mitigates some of the challenges we identified. As evidenced above, the results of applying the framework are promising.

Tutorial: Return Cases from 100 Years Ago Today with the CAP API

The Caselaw Access Project API offers a way to view the corpus of U.S. case law. This tutorial will review how to run a CAP API call to return all cases decided 100 years ago today in your command line.

The Caselaw Access Project API makes 40 million pages of U.S. case law available in machine-readable format, digitized from the collections of the Harvard Law School Library.

Create Your API Call

Let’s start by building our call to the CAP API using the parameters decision_date_min and decision_date_max. Adding these parameters will only return data for cases decided between these two dates.

  • Open a text editor and paste:
curl ""
  • Update (year)-(month)-(day) with today’s date in this format and update 2019 to 1919. Once you’re set, it should look something like this:
curl ""

Use Your API Call

Next, we’ll continue this tutorial in MacOS using Terminal.

  • Open Applications and Select Terminal

  • In the Terminal command line, copy and paste the API call from your text editor and press Enter.

You did it! The CAP API should return metadata for all cases decided one hundred years ago today.

Now, what do the content of those cases look like? Time to add a new piece to the mix.

  • To view the full text of all cases returned, add &full_case=true to the end of your original API call. It should look like this:
curl ""
  • Run your new API call in Terminal.

You’ve finished this tutorial and run a CAP API call using decision_date_min and decision_date_max. Well done!

More Ways to View Data

Before closing, let’s look at more ways to view this same data:

Let’s run that same CAP API call in your browser (this time, without the curl and quotation marks). It should look like this:

Now you can view the same data that was returned by your original API call in your browser. Learn new ways to refine and expand your CAP API call with our API Docs. We can also retrieve this data for a more human readable experience with CAP Search.

With the CAP API, we can retrieve cases from across 360 years of U.S legal history and develop new interfaces to do that. This tutorial shared just one place to start.

Caselaw Access Project: Summer 2019 Data Release

Today we’re announcing a new data release for the Caselaw Access Project. This update includes:

  • In-text figures and illustrations in cases. An example, from Sussman v. Cooper (1976), is below.
  • Inline page numbers. You can provide a pin cite to a specific page in a case by adding #p123 to the URL, or just by clicking the page number.
  • Italic formatting in case text, as detected by OCR.

See what this all looks like in practice with an example.

All of this additional data is available programmatically as well, by downloading our bulk data releases or requesting body_format=html from our API.

This data release develops how we view and share the published U.S. caselaw made available by the Caselaw Access Project. Let us know how you’re creating new ways to see this data at!

Browse the Bookshelf of U.S. Case Law: Announcing the CAP Case Browser

Today we’re announcing the CAP case browser! Browse published U.S. case law from 1658 to 2018—all 40 million pages of it.

The CAP case browser is one way to browse and cite cases made available via the Caselaw Access Project API. The Caselaw Access Project shares cases digitized from the collections of the Harvard Law School Library.

Let’s take a quick tour. Starting the CAP Case Browser at

Teaching Data Science for Lawyers with Caselaw Access Project Data

In the Spring of 2019, at the University of Iowa, I taught an experimental course called Introduction to Quantitative & Computational Legal Reasoning. The idea of the class was beginning "data science" in the legal context. The course is taught in Python, and focuses on introductory coding and statistics, with focused applications in the law (such as statistical evidence of discrimination).

Of course, for students with no prior technical background, it's unrealistic to expect a law school course to produce "data scientists" in the sense used in industry. But my observations of the growth in student skills by the end of the course suggest that it is realistic to produce young lawyers with the skills to solve simple problems with coding, understand data, avoid getting led astray by dubious scientific claims (especially with probability and statistics in litigation), and learn about potential pathways for further learning and career development in legal technology and analytics.

The Library Innovation Lab's Caselaw Access Project (CAP) is particularly well-suited for assignments and projects in such a course. I believe that much of the low-hanging fruit in legal technology is in wrangling the vast amounts of unstructured text that lawyers and courts produce—as is evidenced by the numerous commercial efforts focusing around document production in discovery, contract assembly and interpretation, and similar textual problems faced by millions of lawyers daily. CAP offers a sizable trove of legal text accessible through a relatively simple and well-documented API (unlike other legal data APIs currently available). Moreover, the texts available through CAP are obviously familiar to every law student after their first semester, and their comfort with the format and style of such texts enables students to handle assignments that require them to combine their understanding of how law works with their developing technology skills.

To leverage these advantages, I included a CAP-based assignment in the first problem set for the course, due at the end of the programming intensive that occupies the initial few weeks of the semester. The problem, which is reproduced at the end of this post along with a simple example of code to successfully complete it, requires students to write a function that can call into the CAP API, retrieve an Illinois Supreme Court case (selected due to the lack of access restrictions) by citation, and return a sorted list of each unique case in the U.S. Reporter cited in the case they have retreived.

While the task is superficially simple, students found it fairly complex, for it requires the use of a number of programming concepts, such as functions and control flow, that they had only recently learned. It also exposes students to common beginner's mistakes in Python programming, such as missing the difference between sorting a list in place with list.sort() and returning a new list with sorted(list). In my observation, the results of the problem set accurately distinguished those students who were taking to programming quickly and easily, and those who required more focused assistance.

In addition to such standard programming skills, this assignment requires students to practice slightly more advanced skills such as:

  • Reading and understanding API documentation;
  • Making network requests;
  • Processing text with regular expressions;
  • Using third-party libraries;
  • Parsing JSON data; and
  • Handling empty responses from external data sources.

With luck, this problem can encourage broader thinking about legal text as something that can be treated as data, and the structure inherent in legal forms. With even more luck, some students may begin to think about more intellectual questions prompted by the exercise, such as: can we learn anything about the different citation practices in majority versus dissent opinions, or across different justices?

I plan to teach the class again in Spring 2020; one recurrent theme in student feedback for the first iteration was the need for more practice in basic programming. As such, I expect that the next version of the course will include more assignments using CAP data. Projects that I'm considering include:

  • Write wrapper functions in Python for the CAP API (which the class as a whole could work on releasing as a library as an advanced project);
  • Come to some conclusions about the workload of courts over time or of judges within a court by applying data analysis skills to metadata produced by the API; or
  • Discover citation networks and identify influential cases and/or judges.

Appendix: A CAP-Based Law Student Programming Assignment

Write a function, named cite_finder, that takes one parameter, case, a string with a citation to an Illinois Supreme Court case, and returns the following:

A. None, if the citation does not correspond to an actual case.

B. An empty list, if the citation corresponds to an actual case, but the text of that case does not include any citations to the U.S. Supreme Court.

C. A Python list of unique U.S. Supreme Court citations that appear in the text of the case, if the citation corresponds to an actual case and the case contains any U.S. Supreme Court citation.

Rules and definitions for this problem:

  • "Unique" means a citation to a specific case from a specific reporter.

  • "Citation to an Illinois Supreme Court case" means a string reflecting a citation to the official reporter of the Illinois Supreme Court, in the form 12 Ill. 345 or 12 Ill.2d 345.

  • "U.S. Supreme Court citation" means any full citation (not supra, id, etc.) from the official U.S. Supreme Court reporter as abbreviated U.S.. Party names, years, and page numbers need not be included. Archaic citations (like to Cranch), S.Ct., and L.Ed. Citations should not be included. Subsequent cites/pin cites to a case of the form 123 U.S. at 456 should not be included.

  • "Text" of a case includes all opinions (majority, concurrence, dissent, etc.) but does not include syllabus or any other content.

  • Your function must use the Caselaw Access Project ( API.

  • The list must be sorted using Python’s built-in list sorting functionality with default options.

  • Each citation must appear only once.

Example correct input and output:

  • cite_finder("231 Ill.2d 474") should return ['387 U.S. 136', '419 U.S. 102', '424 U.S. 1', '429 U.S. 252', '508 U.S. 520', '509 U.S. 43']

  • cite_finder("231 Ill.2d 475") should return None

  • cite_finder("215 Ill.2d 219") should return ['339 U.S. 594', '387 U.S. 136', '467 U.S. 837', '538 U.S. 803']

Sample Code to Complete Assignment

import requests, re
endpoint = ""
pattern = r"\d+ U\.S\. \d+"
# no warranties are made as to the correctness of this somewhat lazy regex

def get_opinion_texts(api_response):
        ops = api_response["results"][0]["casebody"]["data"]["opinions"]
        return None
    return [x["text"] for x in ops]

def cite_finder(cite):
    resp = requests.get(endpoint, params={"cite": cite, "full_case": "true"}).json()
    opinions = get_opinion_texts(resp)
    if opinions:
        allcites = []
        for opinion in opinions:
            opcites = re.findall(pattern, opinion)
        filtered = list(set(allcites))
        return filtered
    return None