Anyone who regularly reads about Psychology in the news (and I’m guessing you all do, as you are reading this) will be used to announcements that psychologists have discovered the reason why we behave in a particular way in a particular situation, or that a particular methodology is effective in treating a particular condition. These news items are often less accurate than they might first appear.

There are two reasons for this. The first is that when psychologists want to check that their results are accurate, they use statistics, and statistics never prove anything beyond doubt—rather, they merely tell us the probability that the results are accurate. This is best explained by an example. Let’s suppose that for some reason you want to know if on average men are taller than women. You take a sample of men and women, measure their heights, and then compare these. The simplest way of doing this would be to find the average male height and the average female height. If the male height is greater than the female height, then you have shown that for your sample, the men are taller than the women. That is fine and dandy provided all you want to know about is the people in your sample. But how do we know that what we found in your sample applies to all other men and women? This is where statistics come in. We can mathematically examine the heights and using a statistical test, work out the probability that what you found in your sample will be found again if we repeated the study using a fresh group of people. In other words, we can work out the probability that your study represents something that applies to the whole of the population.

Now it would be wonderful if the statistics test produced an announcement that said in effect ‘yep, what you found here will apply to all future occasions, and this can be 100% guaranteed’. But that is not how statistics work. Instead, the statistics test gives a probability value that what you have found could be due to chance—in other words, it tells you the probability that what you found is unlikely to be found again if you repeated the study. Or put another way, it tells you the likelihood that your study is garbage. But this probability value is only a number and it has to be interpreted subjectively. This provides a problem for researchers: namely, what do the odds have to be before you decide that your study is likely to be showing something that is not garbage and instead is showing something meaningful, is likely to be repeatable and thus represents the population as a whole? Put in more technical terms—what do the odds have to be for researchers to decide the results are significant?

The probability level that represents significance is essentially an arbitrary choice. It is the point at which researchers feel that beyond this point, we will assume that the results are correct. But what should that value be? Clearly, if the statistics show that there is a 99% chance that the results are non-significant, we will reject the findings as garbage. Likewise, we would intuitively reject values of 50% or 25%. But after this point, people will start to differ in where they will draw the line. It is possible to have an extended debate about this, but to cut to the chase, most researchers in Psychology feel that a probability of 5% is the correct cut off point. In other words, if the statistical test shows that there is under a 5% chance of the results being non-replicable, then the findings will be classified as significant. For nearly all Psychology journals, results of studies will not be published unless there is under a 5% chance that the findings are non-replicable.

This leads to the occasional problem, as might be imagined. Studies have been published that seemed to be significant, but nobody afterwards has been able to replicate them. These are rare, but they do happen. Quite simply, the results of the original study were a freak—they did not represent what is going on in the rest of the population. We cannot blame the statistics test for getting it wrong—all the test did was tell us that the chances of the results being garbage were low; it didn’t say that they did not exist.

This begs the question of whether we can trust the findings of psychological experiments at all. If they constantly carry a risk of being wrong, can they be trusted? The answer is that in the main, they can. First, a lot of the time the probability levels found by tests are far lower than 5%. Quite often the odds are 1 in 100 or 1 in 1000. Second, if a second study repeats the same findings as the first study, then the odds on this being a freak finding lengthen enormously. For example, if two studies both find the odds of a freak result are 1 in 1000, then the odds of both studies being wrong are literally one in a million. As more studies are conducted, the odds increase each and every time a study is run. Thus, the odds of the key findings in Psychology being applicable to the general population are astronomically high.

BUT: that does not mean that psychological studies automatically apply to everybody equally, and we will cover this topic next time.

Season’s greetings to you all.