Link Search Menu Expand Document


Legal expert system
Main research: January 2022


  1. What does it claim to do?
  2. Substantiation of claims & potential issues
  3. How might the end-user assess effectiveness?
  4. What form does it take?
  5. Is it currently in use?
  6. The creators
  7. Jurisdiction
  8. License

What does it claim to do?

JusticeBRD (or Justice Beyond Reasonable Doubt) is a :machine learning model designed to reduce wrongful convictions. It is intended to predict ‘correct verdicts’ in criminal trials. It outputs a verdict, either ‘guilty’ or ‘not guilty’, along with a classification probability score. The model does not work directly on textual evidence but on (structured) assessment scores of the evidence presented at trial. It uses :machine learning but no :natural language processing. The numerical scores for strands of evidence are calculated by reference to a scoring scheme devised by the authors. The scoring scheme draws on official guidance issued, for example, by the Crown Prosecution Service of England and Wales. The evidence assessment scores are then used as input for the classifier. The motivation behind JusticeBRD is to provide decision support for juries.

Claimed essential features

  • Provide a framework for mathematically determining guity verdict based on evidence.
  • Make an objective assessment of strands of evidence to determine correct guilty/not guilty verdict.

JusticeBRD is a “computational model designed to reduce wrongful convictions by providing a rigorous framework for mathematically determining guilt based on evidence.” (justiceBRD; archived)

“For each case, justiceBRD identifies the actual strands of evidence presented by the prosecution or defence, and makes an objective assessment of each strand through a series of challenging, standardised questions … The answers to these questions are then entered into a machine learning model. This model has been … shown to produce the correct verdict with 100% accuracy. The output from the model is a verdict together with a degree of certainty that the accused is guilty, based on the actual evidence.” (What is Justice Beyond Reasonable Doubt, or justiceBRD?; archived)

“As a feature intended to assist scrutiny of the identification and evaluation of criminal trial evidence, our model allows for the recording of justifications for the answers given to each of the questions [about the evidence].” (Mitchell et al., 2020)

Claimed rationale and benefits

  • To provide decision support for juries and prevent unsound convictions.
  • Create a robust method to objectively analyse prosecution and defence evidence.

JusticeBRD “is an attempt to bring our justice system back in line with its fundamental principle by forcing a decision to be based solely on the strength of the evidence presented at trial.” (What is Justice Beyond Reasonable Doubt, or justiceBRD?; archived)

“… unsound convictions could be prevented in the future through such an application of machine learning …” (Mitchell et al., 2020)

“The approach to assessing guilt presented here is not intended to mimic the trial process, but instead we have demonstrated that it may have utility in providing decision-support to the jury.” (Mitchell et al., 2020)

“… any long-term intention to use the model in place of, rather than in support of, a jury would need to consider first the wider roles of juries within society.” (Mitchell et al., 2020)

The “methodology was intended to ensure a thorough and robust analysis of the prosecution and defence evidence strands with maximum objectivity in order to avoid the subjective influences that are inherent in jury decision-making.” (Mitchell et al., 2020)

“we aim to motivate additional research into the use of machine learning within the criminal justice system” (Mitchell et al., 2020)

Claimed design choices

  • The system combines an argumentative framework (evidence strands and challenges) and a probabilistic framework (:machine learning).
  • The method for assessing and scoring evidence uses a set of questions based on established guidelines including those produced by the Crown Prosecution Service (England and Wales). 
  • The authors suggest a method to assign numerical scores to strands of evidence in regard to reliability, credibility, relevance and strength.

“Our method makes the link between an argumentative framework (evidence strands and challenges) and a probabilistic framework (Bayes classifier).” (Mitchell et al., 2020)

“… we developed a method for assigning numerical scores to strands of evidence in order to determine whether quantified evidence could successfully train a machine-learning model.” (Mitchell et al., 2020)

“To demonstrate how a narrative description of evidence presented at trial can be transformed into a numerical quantification, we designed a series of questions … based on established evidence assessment guidelines …” (Mitchell et al., 2020)

“… our chosen questions … concern[ed] evidence reliability, credibility, relevance and strength …” (Mitchell et al., 2020)

“We developed our model using the Python programming environment and used the Scikit-Learn library … to provide machine-learning functions.” (Mitchell et al., 2020)

“Personal information on the defendant (including age, race, gender, etc.), or details that may be used to infer such information (geographic location, education level, etc.), were deliberately not included in our assessment of cases … by omitting from model training details that could infer demographic information, we avoid the potential for a biased algorithm to emerge through the learning of existing and historical bias.” (Mitchell et al., 2020)

“The trials we chose were for the crime of murder, as the reporting of these trials was such that the evidence had a high level of public scrutiny allowing details of trial evidence to be retrieved from multiple sources with relative ease.” (Mitchell et al., 2020)


Substantiation of claims & potential issues

  • The system assumes that quantitative scoring is an appropriate method of representing the qualitative weight that different strands of evidence ought to have.
  • Differences in the subjective approach taken by different jurors/juries to scoring evidence may undermine the claim to objective classification of guilt/innocence that is valid across different trials.
  • The design of the system means its classifications of guilty/not guilty are very vulnerable to slight variations in evidential scoring.
  • Deliberate or mistaken weighting of evidential strands will likely result in unsound classifications of guilt or innocence.
  • Criminals or prospective criminals with access to the scoring methodology could ‘game’ the system by anticipating how certain types of evidence are likely to be scored.
  • The conduct of jurors is carefully regulated in many jurisdictions. In England and Wales, for example, it is an offence for a juror to engage in research relating to a case, including researching the law of evidence. Use of the system might conflict with such rules and interfere with the role of jurors as independent triers of fact.
  • The presiding judge at a trial will often provide directions and guidance to a jury. The system’s output might conflict with this guidance, with the result that jurors might take their law from the system rather than the judge.

The key source of information about the backend of justiceBRD is Mitchell et al., ‘Machine learning for determining accurate outcomes in criminal trials’ (2020), a paper authored by its creators. The paper sets out a useful overview of the methodology:


  • The authors claim to be using only 9 cases to train the system with :Naive Bayes classifier and 21 to evaluate it.

“For our purposes … we chose historical unsound conviction cases to be those in which the defendant was found guilty by a jury and later the case was found to be a miscarriage of justice … Five training cases were chosen that fulfilled our criteria …” (Mitchell et al., 2020)

“… for training, we chose historical sound conviction cases to be those in which the defendant was found guilty by a jury and later the conviction was supported by, say, reliable confessions of guilt or guilty pleas at a re-trial … we identified four sound conviction training cases” (Mitchell et al., 2020)

  • Requires significant manual effort to assign scores to information available in the evidence (i.e. strands of evidence).

“We … needed to establish a means of processing and scoring the … evidence that can characterize any criminal trial. Therefore, we developed a method for assigning numerical scores to strands of evidence in order to determine whether quantified evidence could successfully train a machine-learning model. Our approach was inspired by dimensionality reduction … It involved defining which properties, or :features, of evidence to extract (feature extraction), processing or transforming these evidence properties into scores (feature transformation), and reducing these scores into a smaller set of derived :features (feature reduction). We then chose a machine-learning methodology that would be suitable, first to develop (i.e. train) an algorithm to classify criminal cases as guilty or not guilty based on the evidence presented at trial, and then to validate (i.e. test) the algorithm to determine whether it could successfully classify cases that had not been used for its training. The criminal cases we chose to train and test our algorithm were historical cases where the guilty verdict at trial was apparently sound, and historical cases where the guilty verdict at trial was subsequently shown to be insecure (unsound).” (Mitchell et al., 2020)

Determining the outcomes of criminal trials

  • The system was trained only on cases that had murder convictions (wrongful and sound) and not cases where the defendant was found not guilty. Therefore the system is only designed to identify wrongful convictions after the conviction, and not prior to it. This indicates that from a technical perspective (of the system design) it cannot be used to assist the jury in making their decision, but could only be used after the defendant was found guilty.

  • Murder conviction cases are often very complex, and 9 cases, where only 4 cases are sound convictions, are very unlikely to be a representative sample of murder trials. :Naive Bayes is a probabilistic method, and only having 9 cases with very few :features is not enough to make any reliable assumptions about the data.

    “We used the Bayes classifier to determine the probability of a case belonging to a class (guilty or not guilty) using the five derived feature case scores. The probability was calculated based on the distributions of derived :features within each class in a training data set.” (Mitchell et al., 2020)

  • The authors mention that :Naive Bayes is an :unsupervised machine learning classifier, which it is not. :Naive Bayes is a :supervised machine learning classifier, which requires labels to train the system (i.e. guilty or non-guilty verdict).
  • The creators of the system do not explain why :machine learning is necessary for such a system in the first place. JusticeBRD uses a rule-based approach to derive the :features. Given how few cases are available for training such a model, a rule-based approach would be more suitable. However 21 cases would still be far too few to evaluate the model.
  • The creators of justiceBRD performed sensitivity analysis on the test set, which is a big issue. By checking what adjustments can be made to the training set to achieve the highest performance of the model on the test set, an enormous bias is being introduced to the system, since it is now fit to produce these specific results (in the case of the system presented 100% accuracy) for this specific test set. It does not at all guarantee that would be the case for new unseen data. Such experiments are useful, but should be conducted on a validation set, which is set aside specifically to tune the parameters of the model that work best. Test set should only be used to test the final system.


  • The sensitivity analysis showed that changing some of the scores by one point could result in a wrongful conviction, which indicates that such a system is very vulnerable to slight variations in evaluating the evidence.
  • The data does not contain enough cases to be able to train the system or to evaluate it. Traditionally :machine learning systems, including :Naive Bayes based, require at least hundreds of cases to train and evaluate the models in order to produce reliable results.

Rationale and benefits

  • JusticeBRD creators attempt to develop a system that would be able to provide assistance to juries in courts in England and Wales by incorporating statistical methods, and :machine learning to process the scores manually assigned to prosecution and defence evidence.
  • Given all the issues with the data and the use of the algorithm, the system in the state that it has been presented in the paper is not suitable for use in any practical way.

  • With the increased amount of data, such an approach to quantifying information about evidence stands could potentially have a benefit for analysing past murder trial data, however it requires a large amount of manual labour to do so.
  • It is not clear how the system would benefit from using :machine learning.
  • There is no explanation of how such a system could be integrated into a jury decision-making process.


  • Mitchell et al., ‘Machine learning for determining accurate outcomes in criminal trials’ (2020) 19(1) Law, Probability and Risk 43

How might the end-user assess effectiveness?

The creators of justiceBRD have “open[ed] up the model so that those with a fuller knowledge of the cases we used, or with knowledge of new cases, can test the model themselves.” (justiceBRD; archived) They have not provided the code for the model so the model itself is not available to users or subscribers.

Subscribers to JusticeBRD can access the cases used to train and test the model, and see the results of the classification of the cases by the model. Subscribers can also add a new case to the justiceBRD list of User Cases by manually inputting evidence strands and scores for the new case. The model will generate a prediction of guilt (whether guilty or not guilty) for the new case. However, there is, of course, no easy way of checking whether the justiceBRD model can accurately predict guilt or reduce wrongful convictions.


What form does it take?




The justiceBRD website offers a means for others to test the model. (justiceBRD; archived) However, in (Mitchell et al., 2020) the creators state that “future research would need to include proposals for implementing and operating the model” and that their “research … has not considered implementation …” They add that “the aim of our study was to demonstrate the concept that, by quantifying evidence from existing cases, a machine-learning algorithm (or model) could be developed to determine accurately criminal trial outcomes.” Their research, they say, shows that “use of a machine-learning model to assist decision-making based on the analysis of evidence in criminal trials is a valid concept.” It appears therefore that the creators do not consider that the model is ready for its intended user.

justiceBRD does not offer to tailor the model for users.


Is it in current use?

Although the website is ‘live’ there is no indication that the model is in current use.


The creators

Created by



JusticeBRD was created by Jane Mitchell, Simon Mitchell and Cliff Mitchell. Simon Mitchell is an academic working in the field of systems biology of cancer research. Jane Mitchell and Cliff Mitchell are independent researchers.



Background of developers

England and Wales

Target jurisdiction

The authors’ aspiration is that the model might be used in jurisdictions other than England and Wales.

Target legal domains

The model was developed for use in murder trials. The authors’ aspiration is that it might be developed for use in other trials.



The code is not available. No licence is provided for the code.


Updated Top