Feeling confident about confidence intervals

Like science, confidence intervals move towards truth from limited information.

Posted Dec 27, 2010

Third in the series: Nobody becomes a Psych major to study statistics

The science of uncertainty.

Statistics – much to the regret of many potential psych majors – is the core methodology that links together the diverse field that is psychological science. 

Why?  Because science is – ironically enough – based on uncertainty.

Think about your typical psychology study.  Fifty Intro Psych students come into the lab.  Half are assigned to the experimental group and are – just for the sake of argument – given a hug and then play a video game.  The other half are assigned to the control group and just play a video game.  The researcher is interested in whether the hugged group has higher scores. 

Now, if all the researchers were interested is whether IN THESE 50 STUDENTS, the group who had been hugged did better than those who didn’t, all they would have to do is to take the average scores of the two groups and compare them.  It is highly unlikely that they have exactly the same average score.  Whichever group is different did better.  Even if they are just a little tiny bit better, they are really better.  End of story.

If you measure the ENTIRE POPULATION OF INTEREST, you don’t need statistics.  You have the exact numbers in front of you.  There's no question of what those numbers are.

And that’s exactly the problem.  We almost NEVER have the exact numbers in front of us describing exactly the people we're interested in knowing about.  Instead, we have to deal with SAMPLES that represent the population.  The researchers aren’t interested in the effect of hugs on the video game scores of these 50 students.  They’re interested in the effects of hugs on the video game scores of ALL students.  The whole POPULATION of students.

That’s one thing that statistics are for: helping us to understand what the sample we know about tells about the population we want to know about.  The key to solving that relationship between the sample we observe and the population we’re really interested in is one of the hardest things to understand about statistics: the confidence interval.

Although there are confidence intervals around all statistics, I am going to talk here about the most common confidence interval: the confidence interval around the mean.

Samples and populations. 

So here’s the basic problem.  What scientists observe is one small group (a sample).  What they are interested in is a population.  The problem that psychologists have to deal with is that the people in the populations we’re interested in are variable – we’re all different.  So every time we draw a new sample, the mean (for example) will reflect the population from which it is drawn, but not be identical to it. 

Look at the tiny population of 24 students below.  In this tiny population, the mean (average) score for all students is 2.21.  This is what is the called the population parameter - the number that reflects the truth about the population`.  It is the TRUTH we are trying to reproduce.  But say I can’t measure all 24 students.  Instead, I take a sample of three students with scores 3, 3, and 2.  That sample reflects the population it comes from.  The sample mean – 2.67 – is close to the true population mean.  But it isn’t the same.  Not only that, but I could have taken a different sample with scores of 3, 3, and 3 and gotten a different mean (3).  Or a third sample of 1, 2, and 3 and gotten a mean of 2. The means of the samples I draw are all reasonable estimates of the population mean, but I can’t count on any one of them to be exactly right.   

A population of students

How much do the sample means I draw vary?  The central limit theorem tells me several things about the relationship of sample means to the mean of the population from which they were drawn:

  • If I drew an infinite number of samples and calculate the mean of each one, the mean of all the sample means would equal the population mean.  In other words, on average, the difference between the true population mean and the observed sample means average out.
  • The variability I see in the sample means depends on two things: how variable the population is and how big my sample is. 
    • If all the students in my sample were clones with identical scores of 2, it wouldn’t matter which sample I looked at – all my samples would have identical means.  Each of these means would be exactly the same as the population mean.  Unfortunately for psychology (but really lucky for us), human beings don’t work like that.  We are all different – and so are our samples.
    • The bigger my sample, the less sample-to-sample variability I will see in my means.  For example, if I have means of 3, it’s plausible to get a whole sample of 1’s or 2’s or 3’s.  But as my samples get up to 6 or 8 or 10, that’s very unlikely to happen.  Again, with larger samples, extreme values ‘average out’.  The bigger my samples, the more similar their means are to one another, and the better my estimate of what the population mean tends to be.  All other things being equal, bigger samples are always better.
  • Assuming my data is normally distributed – i.e. it looks like a bell curve, with most observations clustered around the mean and approximately 68% of my observations within one standard deviation1 of the mean – I can calculate the expected sample to sample variability in the means.  It is called the standard error.  The standard error of the sample means is the observed standard deviation divided by the square root of the sample size.  Bottom line: the bigger the sample and the more homogeneous the population, the less sample to sample variability I’m going to see in the means.  And thus – and importantly – the more stable my estimate of what the true population mean – THE TRUTH – really is.

Confidence Intervals – what they are and what they’re not

So the root of the problem is that in real life, all scientists ever observe are samples.  And, in real life, all they want to know about is populations. 

What a confidence interval does is allow us to estimate the range in which our true population mean falls, given what we know about the population from the sample we’ve observed.  A confidence interval is always calculated with a given level of certainty: 95% is fairly common.

If I observe a sample mean, given what I know about the relationship between sample size, standard deviations, and sample-to-sample variability in means, I can estimate a RANGE within which it is likely that my population range falls. In other words, I can't look at one sample and know what my population mean is.  But, I can give a well informed guess.

For example, I might observe a sample mean of 100 with a standard deviation of 25.  If I had a sample of 100, I could calculate that my population mean (with a 95% confident interval) is approximately between 95 and 105.  The details of how these calculations work I’ll leave to another blog on the magic that is the normal distribution.  But the important point here has to do with what the confidence interval means and what it does not. 

A 95% confidence interval (i.e. my conclusion that the population mean is somewhere between 95 and 105) tells me that if I drew an infinite number of samples like the one I observed, 95% of the time, the population mean from which they were drawn would have mean between 95 and 105. 

What’s important about that? If you’re comparing two estimates of a population – i.e. two samples or two studies – the one with the smaller confidence interval is more likely to be a close estimate of what the population – the truth – really is. 

What a confidence interval is not. 

Reading statistics exams lately, I can tell you a couple things that a confidence interval around a mean is NOT:

  • It is not the estimate of where 95% of samples will fall around a population mean
  • It is not where 95% of the observations in a sample will fall

Like many things in science, confidence intervals are working towards the truth from limited information.

  • We OBSERVE a sample that tells us something about the world – the population – from which it was drawn.
  • From that observation, we can estimate a range of possible values of what the truth – the parameter we are trying to estimate – really is.

 What we think we know about the population parameter – the truth – is dependent on how good our representation of that reality – the sample – really is.  And that is, of course, something we can never really know.  Until we try to replicate.

© 2010 Nancy Darling. All Rights Reserved


1.       I slipped in the standard deviation there without explanation.  The standard deviation measures how variable a sample is.  Technically, the standard deviation is square root of the variance.  Variance is the average squared distance of each data point from the mean.  All other things being equal, the bigger the standard deviation, the more variable individuals in the sample are.  Traditional statistics are built around this measure of variability. 

For more reading  . . .

Understanding Confidence Intervals and Effect Size Estimates

Other postings in this series: