Link Search Menu Expand Document

Data-driven

On this page

  1. Natural language processing
  2. Machine learning
  3. Supervised machine learning
  4. Unsupervised machine learning
  5. Features
  6. N-grams
  7. Tf-idf
  8. Support Vector Machines
  9. Naive Bayes
  10. Neural Networks
  11. Language model
  12. Attention
  13. Transformers
  14. F1-score
  15. Elasticsearch
  16. Decision tree
  17. Regular expressions
  18. Named Entity Recognition
  19. Network Analysis

Some of the terms below were taken or adapted from M. Medvedeva, ‘Identification, categorisation and forecasting of court decisions’ (2022).

Natural language processing

Natural language processing is a field on the intersection of linguistics and computer science that focuses on computational techniques that allow the processing of (i.e analysis and generation of) natural language in the form of text or speech.


Related terms: regular expressions, machine learning

Typology examples: Jus Mundi, Automatic Catchphrase Identification from Legal Court Case Documents (Mandal et al. 2017), Exploring the Use of Text Classification in the Legal Domain (Sulea et al. 2017), JURI SAYS

Further reading

Top

Machine learning

Machine learning is an omnibus term for a computer program or model that uses historical data to make predictions for new (i.e. unseen) data. In more technical terms, machine learning is the process of approximating a function that maps the model’s input (e.g., the text of legal documents, converted into a computer-suitable numerical representation) to the labels (e.g., ‘defendant won’ or ‘prosecution won’, or the amount of a fine). There are different types of machine learning, e.g., supervised, unsupervised, reinforcement learning. The most common approach in the legal domain is supervised machine learning.


Related terms: supervised machine learning, unsupervised machine learning

Typology examples: Moonlit, Lex Machina, Chinese AI and Law dataset

Further reading

Top

Supervised machine learning

Supervised machine learning is a subcategory of machine learning, that requires labelled data. Within supervised machine learning there are two types of predictive models: classification models, and regression models. Classification is the task of predicting a discrete class label (e.g., whether or not a case will result in an eviction) for a specific input, whereas regression is the task of predicting a continuous quantity (e.g., the length of a prison sentence, in months).


Related terms: machine learning, Support Vector Machines, Naive Bayes, unsupervised machine learning

Typology examples: Predicting Brazilian court decisions (Lage-Freitas et al. 2019), Exploring the Use of Text Classification in the Legal Domain (Sulea et al. 2017), JURI SAYS, JusticeBRD

Further reading

Top

Unsupervised machine learning

Unsupervised machine learning is a subcategory of machine learning for analysing and clustering unlabelled data. Such algorithms are design to find patterns and group data together without having assigned labels to rely on.


Related terms: supervised machine learning

Typology examples: Statutory Article Retrieval Dataset (BSARD), Automatic Catchphrase Identification from Legal Court Case Documents (Mandal et al. 2017), Squirro

Further reading

Top

Features

Features are the input which a machine learning model uses to determine its classification. In the case of legal judgement classification, features are extracted from the text and may be (for example) individual words or sequences of words. In order to make these interpretable for a machine learning algorithm, each feature is converted into a series of numbers (i.e. a vector). The features are most commonly extracted automatically (using a large piece of text), but can also be chosen manually. For instance, if one wants to predict court decisions these could be specific variables, such as judges, precedents, articles of law, etc.


Related terms: machine learning, tf-idf

Typology examples: Exploring the Use of Text Classification in the Legal Domain (Sulea et al. 2017), JURI SAYS

Further reading

Top

N-grams

N-gram is a sequence of words (or characters). Single words are called unigrams, sequences of two words are dubbed bigrams, and sequences of three consecutive words are called trigrams. If we split the sentence By a decision of 4 March 2003 the Chamber declared the application admissible. into bigrams (i.e. two consecutive words) the extracted features consist of:

By a, a decision, decision of, of 4, 4 March, March 2003, 2003 the, the Chamber, Chamber declared, declared the, the application, application admissible, admissible .

For trigrams, the features consist of:

By a decision, a decision of, decision of 4, of 4 March, 4 March 2003, March 2003 the, 2003 the Chamber, the Chamber declared, Chamber declared the, declared the application, the application admissible, application admissible .


Related terms: features, tf-idf

Typology examples: Exploring the Use of Text Classification in the Legal Domain (Sulea et al. 2017), JURI SAYS

Further reading

Top

Tf-idf

The underlying idea is that the n-grams characteristic of a certain case will only occur in a few cases, whereas common, uncharacteristic n-grams will occur in many cases.


Related terms: features, n-grams

Typology examples: Jus Mundi, Exploring the Use of Text Classification in the Legal Domain (Sulea et al. 2017), JURI SAYS

Further reading

Top

Support Vector Machines

Support Vector Machines or SVMs is a supervised machine learning classifier commonly used in natural language processing (i.e. when working with text). To classify the data an SVM tries to split the data points, based on their labels in the dataset (i.e. the training data). Specifically, it will determine the simplest (linear) equation that separates differently-labelled data points from each other with the least amount of error. To make the prediction the decides on the best hyperplane (i.e. a line in multiple dimensions) to separate the data. The support vectors are the data points nearest to this line. The goal of the SVM algorithm is to choose the position of the hyperplane in such a way that the largest possible margin with respect to the support vectors is achieved. This allows for a greater chance to classify new (i.e. unseen) data correctly. While non-linear SVMs are possible, linear SVMs are most commonly used in text classification.


Related terms: supervised machine learning

Typology examples: Exploring the Use of Text Classification in the Legal Domain (Sulea et al. 2017), JURI SAYS

Further reading

Top

Naive Bayes

Naive Bayes (NB) is a frequently-used classification technique in NLP. The NB classifier is based on Bayes’ rule. This rule states that the posterior probability of an outcome O given a word W, is equal to the likelihood of a word being present in the text of a certain outcome, multiplied by the (prior) probability of that outcome. This value then is divided by the probability of the word. However, this latter probability is usually held constant, as the goal is to determine the optimal outcome given a predetermined set of words. The NB classifier calculates the probability of each feature using Bayes rule, but under the (naive) assumption that all features are independent. This means that the NB algorithm does not take the word order into account.


Related terms: supervised machine learning

Typology examples: JusticeBRD

Further reading

Top

Neural Networks

A neural network consists of nodes, which together form layers, and weights (connecting the nodes), which are learned during training. A neural network has to contain an input layer, an output layer, and a hidden layer. However, neural networks can have many more hidden layers; the more layers, the deeper the network. The feature vectors from the training dataset are fed into the model. They are then adjusted in each layer of the model, using weights (i.e. values by which the input from the previous layer is multiplied) and activation functions, which are mathematical functions that convert the nodes input into a new value. The activation function may differ, depending on the specific architecture of the model. Moreover, the size of the hidden layer does not have to be the same as that of the input layer, and the amount of nodes in the hidden layer is chosen before training. Each value from the input layer is adjusted according to the weights, which contributes to each node in the hidden layer.

An example of multilayer perceptron neural network


Related terms: machine learning, supervised machine learning

Typology examples: Mapping BITs RNN framework and dedicated tool, Polisis, Casetext

Further reading

Top

Language model

A language model is a probability distribution over words in a text. It calculates these probabilities by using two strategies at the same time. The first strategy is masking words (i.e. replacing random words with a placeholder, such as [MASK]), trying to predict the missing word based on other words in the sentence, and then calculating its probability. The second strategy uses two sentences, and tries to predict whether the second one follows the first in the text. The advantage of this model is that it can be fine-tuned (i.e. adapted) for a specific task, such as classifying court decisions, by adding an additional layer on top.


Related terms: transformers

Typology examples: Della, Casetext, Contract Understanding Atticus Dataset

Further reading

Top

Attention

Attention allows the neural network to focus on, or pay attention to, more important parts of the input sequence by assigning higher weights, and thereby suppressing other (less important) parts.


Related terms: transformers

Typology examples: Della, Casetext, Contract Understanding Atticus Dataset

Further reading

Top

Transformers

Transformers consist of a sequence-to-sequence architecture combined with attention (see above). This sequence-to-sequence approach is fundamental for tasks such as machine translation, since both the input and output must be a sequence (of words). An example of a popular legal domain task that often uses a sequenceto-sequence approach is legal text summarisation. One of the most well-known transformer models is BERT.


Related terms: language model

Typology examples: Westlaw Edge

Further reading

Top

F1-score

F1-score is the harmonic mean (a type of average) of the two measures precision and recall. For a class, precision is the number of documents for which the assigned label is correct. Recall is the percentage of documents with a specific outcome, which are classified correctly by the system.


Related terms: machine learning

Typology examples: Predicting Brazilian court decisions (Lage-Freitas et al. 2019), Della, Automatic Catchphrase Identification from Legal Court Case Documents (Mandal et al. 2017)

Further reading

Top

Elasticsearch

Elasticsearch is a search engine that is commonly used as the basis for legal search engines. Elasticsearch index document is represented as a combination of tf-idf word representations (see above) and meta-information that has been created by different machine learning systems (e.g. citations, thematic tags, names/dates, etc.) Each type of information is stored in a different ‘field’. When a query is compared to the Elasticsearch index, it is matched against the relevant fields. The sum of all the matching scores constitutes the document score and thus its relevant rank compared to the other matched documents.  Elasticsearch (https://www.elastic.co/) is an open-source framework to construct fast and highly scalable search engines. 


Related terms: tf-idf

Typology examples: Casetext, Squirro

Further reading

Top

Decision tree

A decision tree is a tree where each node represents a feature, each branch represents a decision rule, and each leaf represents the predicted label. A decision tree can be created manually or devised using machine learning.


Related terms: supervised machine learning

Typology examples: TreeAge, Moonlit

Further reading

Top

Regular expressions

Regular expression is a series of text and special characters that specifies a pattern to be searched in the text. Regular expressions (often abbreviated as grep or regex) can be used in most text editors.


Related terms: natural language processing

Typology examples: Lex Machina, Chinese AI and Law dataset, Statutory Article Retrieval Dataset (BSARD)

Further reading

Top

Named Entity Recognition

Named entity recognition is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Named entity recognition aims to find highly variable information (such as names) that tend to occur in similar contexts. For example the word ‘Mr.’ is often followed by a name.


Related terms: machine learning, supervised machine learning

Typology examples: Lex Machina

Further reading

Network Analysis

Network Analysis is a technique that is used to understand large-scale complex connected systems. A network is represented by nodes and edges, where the nodes represent objects (e.g. court decisions), and edges represent relationship between them (e.g. one court decision refers to another court decision). Network Analysis is then used to analyse the structure within networks and find patterns. For instance, one may be able to find most influential (i.e. most cited cases) in a specific legal domain. A network also allows to explore the path from one node to another rather than a single nodes and its edges. This can, for instance, be used to explore how the law evolved over time.

Network analysis can, but does not necessarily have to, involve machine learning.


Related terms: machine learning, Named Entity Recognition, natural language processing

Typology examples: Moonlit, Westlaw Edge

Further reading

Top

This page was last updated on 28 February 2023.