Probability seems a simple concept. We `are used to thinking about it when gambling, or playing games, or calling a dice. It’s easy to calculate, up to a point. You just divide the number of times something happens by the number of times it could happen, and multiply by 100 to arrive at a percentage. That’s easy for a coin toss, where the probability of heads or tails is about 1/2, or 50%. It is simple because a coin toss is a self defined event with only two possible outcomes. The same goes for a dice throw, although there are six sides. In fact, any game or sport with rules is similar. Rules are designed to limit contextual interference, so that the outcome of the game is self contained and the possible outcomes are limited. Real life, though, is not like this. It is much messier, and possible outcomes are far more numerous. This doesn’t stop us calculating probabilities. In fact, we constantly attempt to assess risk in thousands of activities, from driving to climbing, to just crossing the road or making tea.
But it is clear that, for many of these events, which have no rules as such, we can’t calculate probability the same we way as we do with a coin toss. How to work out more complex probabilities is something that has preoccupied mathematicians for a long time. The first really successful formula for calculating probability in more complex events was developed about 250 years ago, by Thomas Bayes. It was published after his death, in 1763, and offered a way of calculating the probability of an event which has remained essential to the development of statistics and computing.
Bayesian probability attempts to calculate probabilities where events are influenced by contextual factors. Consider the risk of catching flu. It’s quite simple to calculate the probability of catching it or not using a simple ‘coin toss’ method. You can count up the people who do get it, in a given time span or age group, divide by the total sample number and multiply by 100 to get a percentage. But let’s say you actually have a sore throat. What it is the probability you have flu now?
The first step in a Baysean calculation would be to understand the flu infection rate in a defined group of people to which you belong. Then we need a calculation that takes into account the different kinds of information we might have which relate a sore throat to flu. This is what Bayes’ theorem does. According to the Baysean formula, the probability of the information you have implying that an event might happen (say, that a symptom of flu means you have flu) is equal to the probability of the event occurring at all (what is the the total number of people who currently have flu?) times the probability of the event taking into account the new information (what proportion of people who have flu also have your symptom? Note that this is the inverse of the initial proposition) divided by the probability of the occurrence of the information (how often does your symptom occur generally, regardless of flu?).
So the probability I have flu, given I have a sore throat =
The probability I would have a sore throat if I had flu (ie the inverse proposition) x the probability of having flu at all
DIVIDED BY the general probability of having a sore throat
For the sake of discussion we might say that’s about (0.5×0.4)/0.5 = 40%
This shows us that a sore throat is probably a poor test for flu. A more accurate and better test would give a higher probability outcome. Let’s look at a COVID test. If you get a positive test, what is the probability you have COVID, if you are a part of population group X?
The probability I have COVID, given I have a positive test =
The probability I would get a positive result if I did have COVID (the inverse) x the rate of COVID infection in the population group X
DIVIDED BY the total rate of positive tests in population group X
Let’s say that’s (0.8×0.1)/0.09 = 89%
We might conclude from this that the COVID test is more accurate for COVID than the sore throat test is for flu. You can see how Baysean probability is able to give a probability value adjusted by information which can be updated. In the latter case, test accuracy may increase, infection rates may decrease, and risk rates can be assessed accordingly.
The certainty of the test is mediated by two variables: the probability of the event being tested for, and the probability of the occurrence of the information. Let’s apply this to something apparently simpler, like getting run over and dying when crossing a road. Here, we can use a value we already know, as the probability of this by the age of 100 is reckoned to be approximately 0.005 in the UK. According to the Baysean formula, working back from the answer, you might arrive at this probability rate as follows:
The probability of dying if you have been run over =
The probability of you having been run over, if you have died x the probability of dying before age100
Divided by the probability of being run over before age 100
Which we could guess might look something like:
You can see how this formula could be used to give different values if you inputed different age criteria or used a different cause of death. However, we can also easily see that even the Baysean calculation wouldn’t really be sophisticated enough to give you a true estimate of probability of risk to yourself from dying when crossing the road at any one time. It is not necessarily the case that you are likely to have been run over and killed by the time you have crossed the road 20,000 times. Clearly the calculation of this probability in relation to any one person will vary enormously, because there would be a number of factors which would relate directly to you and your surroundings, at the time of the accident. You would need to take into account your age and mobility, the frequency you actually cross the road, the nature of the road, the average speed of traffic (which will vary over different lifetimes), your vision, your power of attention and mental state, as well as the position of the crossing, the speed of the car and the attention of the driver, and there may then be further ancillary conditions to calculate varying even these variables further.
This is a difficult, if not an impossible sum to do, because there are so many potential causes and conditions to take account of, many of which are closely associated, or entangled, with each other. In fact, we may well consider, on reflection, that the average figure first given might well be meaningless. The technique of creating an inverse proposition only works if you can formulate a problem in the right way in the first place. The question of whether you will be killed crossing the road might be ‘given’ the rate of deterioration of your eyesight, your arthritis, your hearing, your age and mobility, the placing of the crossing point, the average speed of passing vehicles etc etc etc. Having just two pieces of variable information is not enough. In fact, the more ‘everyday’ danger it is, the more elaborately entangled it might appear to be in specific circumstances.
All these variables are critical because the sum of all the probabilities attached to them must always add up to one event. Every new factor you encounter requires a recalculation of all others. And while technology might make this possible in theory, the availability, variability and reliability of the input information makes many such calculations impossible in practise. In reality, reality is so complex that we mostly don’t bother to do actual calculations at all. You can see what I mean if we imagine a specific circumstance in the case of crossing the road. If you were 98 and living in an old peoples home, could move at a quarter of normal walking speed and were killed on a crossing on a blind corner on an A road outside the gate of the home, it would (hopefully) be determined immediately that the probability of a serious accident had been dangerously high. In this unfortunate hypothetical situation nobody would think of needing a Baysean formula, or indeed any kind of formula.
And this, in fact, is the point of this paper. Calculation is very useful when things are simple. It enables big business to make money out of gamblers, and clever gamblers to make money out of casinos. It helps health authorities to calculate population risks from diseases, and insurance companies to provide financial compensation for obvious risks. But everyday life is too complex to allow you to calculate every factor in every risk, or even in most risks. As we can begin to see here, it is the unregulated everyday that poses the most complex risks and probabilities. When it comes the the crunch and you want to do something really complex, like crossing a road safely, you have to rely on the most sophisticated tools available to human beings. You have to use your brain, and engage your imagination.