In the Spring of 2019, at the University of Iowa, I taught an experimental course called Introduction to Quantitative & Computational Legal Reasoning. The idea of the class was beginning "data science" in the legal context. The course is taught in Python, and focuses on introductory coding and statistics, with focused applications in the law (such as statistical evidence of discrimination).
Of course, for students with no prior technical background, it's unrealistic to expect a law school course to produce "data scientists" in the sense used in industry. But my observations of the growth in student skills by the end of the course suggest that it is realistic to produce young lawyers with the skills to solve simple problems with coding, understand data, avoid getting led astray by dubious scientific claims (especially with probability and statistics in litigation), and learn about potential pathways for further learning and career development in legal technology and analytics.
The Library Innovation Lab's Caselaw Access Project (CAP) is particularly well-suited for assignments and projects in such a course. I believe that much of the low-hanging fruit in legal technology is in wrangling the vast amounts of unstructured text that lawyers and courts produce—as is evidenced by the numerous commercial efforts focusing around document production in discovery, contract assembly and interpretation, and similar textual problems faced by millions of lawyers daily. CAP offers a sizable trove of legal text accessible through a relatively simple and well-documented API (unlike other legal data APIs currently available). Moreover, the texts available through CAP are obviously familiar to every law student after their first semester, and their comfort with the format and style of such texts enables students to handle assignments that require them to combine their understanding of how law works with their developing technology skills.
To leverage these advantages, I included a CAP-based assignment in the first problem set for the course, due at the end of the programming intensive that occupies the initial few weeks of the semester. The problem, which is reproduced at the end of this post along with a simple example of code to successfully complete it, requires students to write a function that can call into the CAP API, retrieve an Illinois Supreme Court case (selected due to the lack of access restrictions) by citation, and return a sorted list of each unique case in the U.S. Reporter cited in the case they have retreived.
While the task is superficially simple, students found it fairly complex, for it requires the use of a number of programming concepts, such as functions and control flow, that they had only recently learned. It also exposes students to common beginner's mistakes in Python programming, such as missing the difference between sorting a list in place with list.sort() and returning a new list with sorted(list). In my observation, the results of the problem set accurately distinguished those students who were taking to programming quickly and easily, and those who required more focused assistance.
In addition to such standard programming skills, this assignment requires students to practice slightly more advanced skills such as:
Reading and understanding API documentation;
Making network requests;
Processing text with regular expressions;
Using third-party libraries;
Parsing JSON data; and
Handling empty responses from external data sources.
With luck, this problem can encourage broader thinking about legal text as something that can be treated as data, and the structure inherent in legal forms. With even more luck, some students may begin to think about more intellectual questions prompted by the exercise, such as: can we learn anything about the different citation practices in majority versus dissent opinions, or across different justices?
I plan to teach the class again in Spring 2020; one recurrent theme in student feedback for the first iteration was the need for more practice in basic programming. As such, I expect that the next version of the course will include more assignments using CAP data. Projects that I'm considering include:
Write wrapper functions in Python for the CAP API (which the class as a whole could work on releasing as a library as an advanced project);
Come to some conclusions about the workload of courts over time or of judges within a court by applying data analysis skills to metadata produced by the API; or
Discover citation networks and identify influential cases and/or judges.
Appendix: A CAP-Based Law Student Programming Assignment
Write a function, named cite_finder, that takes one parameter, case, a string with a citation to an Illinois Supreme Court case, and returns the following:
A. None, if the citation does not correspond to an actual case.
B. An empty list, if the citation corresponds to an actual case, but the text of that case does not include any citations to the U.S. Supreme Court.
C. A Python list of unique U.S. Supreme Court citations that appear in the text of the case, if the citation corresponds to an actual case and the case contains any U.S. Supreme Court citation.
Rules and definitions for this problem:
"Unique" means a citation to a specific case from a specific reporter.
"Citation to an Illinois Supreme Court case" means a string reflecting a citation to the official reporter of the Illinois Supreme Court, in the form 12 Ill. 345 or 12 Ill.2d 345.
"U.S. Supreme Court citation" means any full citation (not supra, id, etc.) from the official U.S. Supreme Court reporter as abbreviated U.S.. Party names, years, and page numbers need not be included. Archaic citations (like to Cranch), S.Ct., and L.Ed. Citations should not be included. Subsequent cites/pin cites to a case of the form 123 U.S. at 456 should not be included.
"Text" of a case includes all opinions (majority, concurrence, dissent, etc.) but does not include syllabus or any other content.
Last week we hosted the first Caselaw Access Project Research Summit to bring together researchers from this community who had already made progress exploring data made available by the Caselaw Access Project.
Presenters shared research that highlighted a broad range of disciplines and perspectives. They explored the contents of court opinions and the evolution of language over time. They examined things like text comprehension and language patterns and explored themes like link rot and connecting legal data with other digital collections. They asked what words appear in this text corpus, how we can identify changes in the meaning of those words, and how changes in this legal corpus connect to the larger landscape. All of their work was interesting and important, and we’re excited to see what insights they continue to develop.
The Caselaw Access Project Research Summit was our first attempt to bring researchers together in person to meet, share and learn, and to help us better understand how we can support their work. We’re immensely grateful for their participation in the event, and we look forward to doing it again.
Are you using Caselaw Access Project data in your work? Share it with us at firstname.lastname@example.org.
Historical Trends is a way to visualize word usage in court opinions over time. We want Historical Trends to help you ask new questions and understand the law in new ways. Let’s see how this works with some examples:
Want to build your own visualization? Here’s how to get started:
Let’s say you have a question about produce. You want to know if apples, bananas, or oranges are more commonly shown in the legal record.
Try it: go to https://case.law/trends/ and enter one or more keywords separated by a comma. For now, let’s try “apple”, “banana”, and “orange”.
Refine your query or learn more by selecting “Advanced” or the gear icon shown above the visualization.
The data underlying Historical Trends is drawn from the Harvard Law Library’s collection of roughly 6.7 million official, published opinions issued by state and federal courts throughout U.S. history and made available as part of the Caselaw Access Project.
pip-tools is a "set of command line tools to help you keep your pip-based [Python] packages fresh, even when you've pinned them." My changes help the pip-compile --generate-hashes command work for more people.
This isn't a lot of code in the grand scheme of things, but it's the largest set of contributions I've made to a mainstream open source project, so this blog post is a celebration of me! 🎁💥🎉 yay. But it's also a chance to talk about package manager security and open source contributions and stuff like that.
I'll start high-level with "what are package managers" and work my way into the weeds, so feel free to jump in wherever you want.
What are package managers?
Package managers help us install software libraries and keep them up to date. If I want to load a URL and print the contents, I can add a dependency on a package like requests …
… and let requests do the heavy lifting:
But there's a problem – if I install exactly the same package later, I might get a different result:
I got a different version of requests than last time, and I got some bonus dependencies (certifi, urllib3, idna, and chardet). Now my code might not do the same thing even though I did the same thing, which is not how anyone wants computers to work. (I've cheated a little bit here by showing the first example as though pip install had been run back in 2013.)
So the next step is to pin the versions of my dependencies and their dependencies, using a package like pip-tools:
Now when I run pip install -r requirements.txt I will always get the same version of requests, and the same versions of its dependencies, and my program will always do the same thing.
… just kidding.
The problem with pinning Python packages
Unfortunately pip-compile doesn't quite lock down our dependencies the way we would hope! In Python land you don't necessarily get the same version of a package by asking for the same version number. That's because of binary wheels.
Up until 2015, it was possible to change a package's contents on PyPI without changing the version number, simply by deleting the package and reuploading it. That no longer works, but there is still a loophole: you can delete and reupload binary wheels.
Wheels are a new-ish binary format for distributing Python packages, including any precompiled programs written in C (or other languages) used by the package. They speed up installs and avoid the need for users to have the right compiler environment set up for each package. C-based packages typically offer a bunch of wheel files for different target environments – here's bcrypt's wheel files for example.
So what happens if a package was originally released as source, and then the maintainer wants to add binary wheels for the same release years later? PyPI will allow it, and pip will happily install the new binary files. This is a deliberate design decision: PyPI has "made the deliberate choice to allow wheel files to be added to old releases, though, and advise folks to use –no-binary and build their own wheel files from source if that is a concern."
That creates room for weird situations, like this case where wheel files were uploaded for the hiredis 0.2.0 package on August 16, 2018, three years after the source release on April 3, 2015. The package had been handed over without announcement from Jan-Erik Rediger to a new volunteer maintainer, ifduyue, who uploaded the binary wheels. ifduyue's personal information on Github consists of: a new moon emoji; an upside down face emoji; the location "China"; and an image of Lanny from the show Lizzie McGuire with spirals for eyes. In a bug thread opened after ifduyue uploaded the new version of hiredis 0.2.0, Jan-Erik commented that users should "please double-check that the content is valid and matches the repository."
The problem is that I can't do that, and most programmers can't do that. We can't just rebuild the wheel ourselves and expect it to match, because builds are not reproducible unless one goes to great lengths like Debian does. So verifying the integrity of an unknown binary wheel requires rebuilding the wheel, comparing a diff, and checking that all discrepancies are benign – a time-consuming and error-prone process even for those with the skills to do it.
So the story of hiredis looks a lot like a new open source developer volunteering to help out on a project and picking off some low-hanging fruit in the bug tracker, but it also looks a lot like an attacker using the perfect technique to distribute malware widely in the Python ecosystem without detection. I don't know which one it is! As a situation it's bad for us as users, and it's not fair to ifduyue if in fact they're a friendly newbie contributing to a project.
(Is the hacking paranoia warranted? I think so! As Dominic Tarr wrote after inadvertently handing over control of an npm package to a bitcoin-stealing operation, "I've shared publish rights with other people before. … open source is driven by sharing! It's great! it worked really well before bitcoin got popular.")
This is a big problem with a lot of dimensions. It would be great if PyPI packages were all fully reproducible and checked to verify correctness. It would be great if PyPI didn't let you change package contents after the fact. It would be great if everyone ran their own private package index and only added packages to it that they had personally built from source that they personally checked, the way big companies do it. But in the meantime, we can bite off a little piece of the problem by adding hashes to our requirements file. Let's see how that works.
Adding hashes to our requirements file
Instead of just pinning packages like we did before, let's try adding hashes to them:
Now when pip-compile pins our package versions, it also fetches the currently-known hashes for each requirement and adds them to requirements.txt (an example of the crypto technique of "TOFU" or "Trust On First Use"). If someone later comes along and adds new packages, or if the https connection to PyPI is later insecure for whatever reason, pip will refuse to install and will warn us about the problem:
But there are problems lurking here! If we have packages that are installed from Github, then pip-compile can't hash them and pip won't install them:
That's a serious limitation, because -e requirements are the only way pip-tools knows to specify installations from version control, which are useful while you wait for new fixes in dependencies to be released. (We mostly use them at LIL for dependencies that we've patched ourselves, after we send fixes upstream but before they are released.)
And if we have packages that rely on dependencies pip-tools considers unsafe to pin, like setuptools, pip will refuse to install those too:
This can be worked around by adding --allow-unsafe, but (a) that sounds unsafe (though it isn't), and (b) it won't pop up until you try to set up a new environment with a low version of setuptools, potentially days later on someone else's machine.
Those two problems meant that, when I set out to convert our Caselaw Access Project code to use --generate-hashes, I did it wrong a few times in a row, leading to multiple hours spent debugging problems I created for me and other team members (sorry, Anastasia!). I ended up needing a fancy wrapper script around pip-compile to rewrite our requirements in a form it could understand. I wanted it to be a smoother experience for the next people who try to secure their Python projects.
This was a long process, and began with resurrecting a pull request from 2017 that had first been worked on by nim65s. I started by just rebasing the existing work, fixing some tests, and submitting it in the hopes the problem had already been solved. Thanks to great feedback from auvipy, atugushev, and blueyed, I ended up making 14 more commits (and eventually a follow-up pull request) to clean up edge cases and get everything working.
Landing this resulted in closing two other pip-tools pull requests from 2016 and 2017, and feature requests from 2014 and 2018.
Warn when --generate-hashes output is uninstallable
Hopefully, between these two efforts, the next project to try using –generate-hashes will find it a shorter and more straightforward process than I did!
Things left undone
Along the way I discovered a few issues that could be fixed in various projects to help the situation. Here are some pointers:
First, the warning to use --allow-unsafe seems unnecessary – I believe that --allow-unsafe should be the default behavior for pip-compile. I spent some time digging into the reasons that pip-tools considers some packages "unsafe," and as best I can tell it is because it was thought that pinning those packages could potentially break pip itself, and thus break the user's ability to recover from a mistake. This seems to no longer be true, if it ever was. Instead, failing to use –allow-unsafe is unsafe, as it means different environments will end up with different versions of key packages despite installing from identical requirements.txt files. I started some discussion about that on the pip-tools repo and the pip repo.
Second, the warning not to use version control links with --generate-hashes is necessary only because of pip's decision to refuse to install those links alongside hashed requirements. That seems like a bad security tradeoff for several reasons. I filed a bug with pip to open up discussion on the topic.
Third, PyPI and binary wheels. I'm not sure if there's been further discussion on the decision to allow retrospective binary uploads since 2017, but the example of hiredis makes it seem like that has some major downsides and might be worth reconsidering. I haven't yet filed anything for this.
Personal reflections (and, thanks Jazzband!)
I didn't write a ton of code for this in the end, but it was a big step for me personally in working with a mainstream open source project, and I had a lot of fun – learning tools like black and git multi-author commits that we don't use on our own projects at LIL, collaborating with highly responsive and helpful reviewers (thanks, all!), learning the internals of pip-tools, and hopefully putting something out there that will make people more secure.
pip-tools is part of the Jazzband project, which is an interesting attempt to make the Python package ecosystem a little more sustainable by lowering the bar to maintaining popular packages. I had a great experience with the maintainers working on pip-tools in particular, and I'm grateful for the work that's gone into making Jazzband happen in general.
Next stop on the road will be UNT Open Access Symposium from May 17 - 18 at University of North Texas College of Law. See you there!
On the road we were able to connect the Caselaw Access Project with new people. We were able to share where data comes from, what kinds of questions we can ask when we have the machine readable data to do it, and all the new ways that you’re all building and learning with Caselaw Access Project data to see the landscape of U.S. legal history in new ways.
The CAP Roadshow doesn’t stop here! Share Caselaw Access Project data with a colleague to keep the party going.
The prospect of having the Caselaw Access Project dataset become public for the first time brings with it the obvious (and wholly necessary) ideas for data parsing: our dataset is vast and the metadata structured (read about the process to get to this), but the work of parsing the dataset is far from over. For instance, there's a lot of work to be done in parsing individual parties in CAP (like names of judges), we don't yet have a citator, and we still don't know who wins a case and who loses. And for that matter, we don't really know what "winning" and "losing" even means (if you are interested in working on any of these problems and more, start here: https://case.law/tools/).
At LIL we've also undertaken lighter explorations that highlight opportunities made possible by the data and help teach ways to get started parsing caselaw. To that end, we've written caselaw poetry with a limerick generator, discovered the most popular words in California caselaw with wordclouds, and found all instances of the word "witchcraft" for Halloween. We have created an examples repository, for anyone just starting out, too.
This particular project began as a quick look at a very silly question:
What, exactly, is the color of the law?
It turned, surprisingly, into a somewhat deep dive of an introduction into NLP.
In this blog post, I'm putting down some thoughts about my decisions, process, and things I learned along the way. Hopefully it will inspire someone looking into the CAP data to ask their own silly (or very serious) questions. This example might also be useful as a small tutorial for getting started on neural-based NLP projects.
How does one go about deciding on the color of the law?
One way to do it is to find all the mentions of colors in each case.
Since there is a finite number of labelled colors, we could look at each color and simply run a search though the dataset on each word.
So let's say we start by looking at the color "green". But wait! We've immediately run into trouble. It turns out that "Green" is quite a popular last name. Excluding anywhere the "G" is capitalized, we might miss important data, like sentences that start with the color green. Adding to the trouble, the lower cased "orange" is both a color and a fruit. Maybe we could start by looking at the instances of the color words as adjectives?
Enter Natural Language Processing
Natural Language Processing (NLP) is a field of computer science aimed at the understanding and parsing of texts.
While I'll be introducing NLP concepts here, if you want a more in-depth write-up on NLP as a field, I would recommend Adam Geitgey's series, Natural Language Processing is Fun!
A brief overview of some NLP concepts used
Tokenization: Tokenizing is the process of divvying up a wall of text into smaller components — typically, those are words (sometimes they are characters). Having word chunks allows us to do all kinds of parsing. This can be as simple as "break on space" but usually also treats punctuation as a token.
Parts of speech tagging: tagging words with their respective parts of speech (noun, adjective, verb, etc). This is usually a built-in method in a lot of NLP tools (like nltk and spacy). The tools use a pretrained model, often one built on top of large datasets that had been tediously, and manually tagged (thanks to all ye hard workers of yesteryear that have made our glossing over this difficult work possible).
Root parsing: grouping of syntactically cogent terms. The token chosen (in this case, we're only looking at adjectives), and the "parent" of this token (read this documentation to learn more).
Unfortunately, we don't have magical reference to every use of a color in the law, so we'll need to come up with some heuristics which will get us most of the way there. There are a couple ways we could go about finding the colors:
The easiest route we can take is to just match an adjective in the colors list that we have when we come across it and call it a day. The other, more interesting to me way, is to get the context pertinent to the color, using root parsing, to make sure that we get the right shade. "Baby pink" is very different from "hot pink", after all.
To get here, we can use the NLP library spacy. The result is a giant list of of word pairings like "red pepper" and "blue leather". This may read as a food and a type of cloth and not a color. As far as this project is concerned, however, we're treating these word pairings as specific shades. "Blue sky" might be a different shade than "blue leather". "Red pepper" might be a different shade than "red rose".
But exactly what shade is "red pepper" and how would a machine interpret it?
To find out the answer, we turn to recent advances in NLP techniques using Neural Networks.
Recurrent Neural Networks, a too-brief overview
Neural Networks (NNs) are functions that are able to "learn" (more on that in a bit) from a large trove of data. NNs are used for lots of things: from simple classifiers (is it a picture of a dog? Or a cat?) to language translation, and so forth. Recurrent Neural Networks (RNNs) are a specific kind of NN: they are able to learn from past iterations by passing the results of a preceding output down the chain, meaning that running them multiple times should produce increasingly more accurate results (with a caveat — if we run too many epochs, or full training cycles — each epoch being a forward and backward pass through all of the data), there's a danger of "overfitting", having the RNN essentially memorize the correct answers!.
A contrived example of running an already fully-trained RNN over 2-length sequences of words might look something like this:
Input: "box of rocks", Output: prediction of word "rocks"
Step1: RNN("", "box") -> 0% "rocks"
Step2: RNN("box", "of") -> 0% "rocks"
Step3: RNN("of", "rocks") -> 50% "rocks"
Notice that an RNN works over a fixed sequence length, and would only be able to understand word relationships bounded by this length. An LSTM (Long short term memory) is a special type of RNN that overcomes this by adding a type of "memory" which we won't get into here.
Crucially, the NN has two major components: forward and backward propagation.
Forward propagation is responsible for getting the output of the model (as in, stepping forward in your network by running your model). An additional step is model evaluation, finding out how far from our expectations (our labelled "truth" set) our output is — in other terms, getting the error/loss. This also plays a role in backward propagation.
Backward propagation is responsible for stepping backward through the network, and computing a derivative between the computer error and the weights of the model. This derivative is used by the gradient descent function, an optimization that adjusts the weights to decrease the error by a small amount for each step. This is the "learning" part of NN — by running it over and over, stepping forward, backward, figuring out the derivative, running it through the gradient descent, adjusting the weights to minimize the error, and repeating the cycle, the NN is able to learn from past mistakes and successes, and move towards a more correct output.
As luck would have it, I happened upon a white paper that solved the exact problem of figuring out the "correct" shade for an entered phrase, and a fantastic implementation of it (albeit one that needed a bit of tuning).
The basic steps to reproduce are these:
We take a large set of color data. https://www.colourlovers.com/api gives us access to about a million labeled, open source, community-submitted colors — everything from "dutch teal" (#1693A5) to a very popular color named "certain frogs" (#C3FF68).
We create a truth set. This is important because we need to train the model against something that it treats as correct. For our purposes, we do have a sort of "truth" of colors, a largely agreed-upon set in the form of HTML color codes with their corresponding hex values. There are 148 of those that I've found.
We convert all hex values to CIE LAB values (these are more conducive to an RNN's gradient learning as they are easily mappable in 3d space).
We tokenize each value on character ("blue" becomes "b", "l", "u", "e").
We call in PyTorch to the rescue us from the rest of the hard stuff, like creating character embeddings
And we run our BiLSTM model (a bi-directional Long Short Term Memory model, which is a type of RNN that is able to remember inputs from current and previous iterations)
Whereas the colors in the late 1800s are muted, and mostly grays, browns, and tans, the colors in the 21st century are bright blues, reds, oranges, greens.
We seem to be getting a small window into U.S.'s industrialization and the fashion of the times ("industrialization" is a latent factor (or a hidden neuron) here :-)
Who would have thought we could do that by looking at caselaw?
When I first started working on this project, I had no expectations of what I would find. Looking at the data now, it is clear that some of the most commonly present colors are black, brown, and white, and from what I can tell, the majority of the mentions of those are race related. A deeper dive would require a different person to look at this subject, and there are many other more direct ways of approaching such a serious matter than looking at the colors of caselaw.
If you have any questions, any kooky ideas about caselaw, or any interest in exploring together, please let me know!
Today we're launching CAP search, a new interface to search data made available as part of the Caselaw Access Project API. Since releasing the CAP API in Fall 2018, this is our first try at creating a more human-friendly way to start working with this data.
CAP search supports access to 6.7 million cases from 1658 through June 2018, digitized from the collections at the Harvard Law School Library. Learn more about CAP search and limitations.
We're also excited to share a new way to view cases, formatted in HTML. Here's a sample!
We invite you to experiment by building new interfaces to search CAP data. See our code as an example.
One of the things people often ask about Perma.cc
is how we ensure the preservation of Perma links. There are some
answers in Perma's documentation, for example:
Perma.cc was built by Harvard’s Library Innovation Lab and is backed
by the power of libraries. We’re both in the forever business:
libraries already look after physical and digital materials — now we
can do the same for links.
Links will be preserved as a part of the permanent collection of
participating libraries. While we can't guarantee that these records
will be preserved forever, we are hosted by university libraries that
have endured for centuries, and we are planning to be around for the
long term. If we ever do need to shut down, we have laid out a
detailed contingency plan for
preserving existing data.
The contingency plan is worth
reading; I won't quote it here. (Here's a
Perma link to it, in case we've updated
it by the time you read this.) In any case, all three of these
statements might be accused of a certain nonspecificity - not as who
should say vagueness.
I think what people sometimes want to hear when they ask about
preservation of Perma links is a very specific arrangement of
technology. A technologically specific answer, however, can only be
provisional at best. That said, here's what we do at present: Perma
saves captures in the form of
WARC files to an
S3 bucket and serves them
from there; within seconds of each capture, a server in Germany
downloads a copy of the WARC; twenty-four hours after each capture, a
copy of the WARC is uploaded to the
Internet Archive (unless the link has been
marked as private); also at the twenty-four hour mark, a copy is
distributed to a private
LOCKSS network. The database
of links, users, registrars, and so on, in
snapshotted daily, and another snapshot of the database is dumped and
saved by the server in Germany.
Here's why that answer can only be provisional: there is no digital
storage technology whose lifespan approaches the centuries of
acid-free paper or microfilm. Worse, the systems housing the
technology will tend to become insecure on a timescale measured in
days, weeks, or months, and, unattended, impossible to upgrade in
perhaps a few years. Every part of the software stack, from the
operating system to the programming language to its packages to your
code, is obsolescing, or worse, as soon as it's deployed. The
companies that build and host the hardware will decline and fall; the
hardware itself will become unperformant, then unusable.
Mitigating these problems is a near-constant process of monitoring,
planning, and upgrading, at all levels of the stack. Even if we were
never to write another line of Perma code, we'd need to update
Django and all the other
Python packages it depends on (and a Perma
with no new code would become less and less able to capture pages on
the modern web); in exactly the same way, the preservation layers of
Perma will never be static, and we wouldn't want them to be. In fact,
their heterogeneity across time, as well as at a given moment, is a
The core of digital preservation is institutional commitment, and the
means are people. They require dedication, expertise, and flexibility;
the institution's commitment and its staff's dedication are constants,
but their methods can't be. The resilience of a digital preservation
program lies in their careful and constant attention, as in the
commonplace, "The best fertilizer is the farmer's footprint."
Although I am not an expert in digital preservation, nor well-read in
its literature, I'm a practitioner; I'm a librarian, a software
developer, and a DevOps
engineer. Whether or not you thought this was fertilizer, I'd love to
hear from you. I'm