Naked Emperor: What Standardized Tests Don’t Tell Us
Recently the Times carried a story saying that parents in New York City, fed up with the blanket of tests with which schools are smothering their children, are keeping their kids home on test days. This raises, yet again, the question of whether the value of standardized tests outweighs the burden they place on children, teachers, and to some extent, families.
Most of the public debate so far has focused on whether the tests are being used in unfair ways. Relatively few have publicly questioned an underlying and crucial assumption- that the tests measure something meaningful, or predict something significant, beyond themselves. I have just finished reviewing over 200 studies of K-12 standardized tests. What I have discovered is startling- most tests used to evaluate students, teachers, and school districts predict almost nothing except similar scores on subsequent tests. I have found virtually no research demonstrating a relationship between those tests and measures of thinking on the one hand, or life outcomes on the other. To grasp what we do and do not (yet) know about standardized tests, it’s worth considering a few essential puzzles: why we find individual differences in test scores (why one child does better or worse than others), what makes a child’s test scores go up, and what such improvement could possibly indicate.
Most researchers agree that several non-school factors have a big impact on children’s performance on academic tests. These non-school factors help explain why, overall, most children’s test scores are fairly stable. Children in poverty do less well, all other things being equal, than children from families with adequate incomes. Children who don’t hear much language at home are at an academic disadvantage, which is manifested, among other places, in their test performance. Children whose parents read a lot do better than children whose parents don’t read. And these factors all tend to be bundled together- middle class children are more likely to have educated parents, and hear more language at home than children who grow up in poverty. In other words, some children have a lot of educational advantage compared to others, and this is reflected in their test scores. If you hold all of these non-school features of the environment steady- for instance, by comparing only children who come from the same economic background, some children will still do better than others. The remaining difference between children is, to some extent, a function of underlying intelligence. Both these influences- home environment and intelligence, are quite stable, which helps explain why children who get a higher than average test score in third grade are likely to get a higher than average test score in 9th grade.
But most of us believe that intelligence and family background do not seal a child’s fate. We believe that children can learn something in school which gives them knowledge and skills above and beyond what they can get on their own. Furthermore, the current faith in testing suggests that we believe that test scores are a good measure of whether children are learning something valuable at school. As we have seen in the news, some classrooms (or even whole schools) have succeeded in boosting children’s scores beyond what was predicted by their earlier scores. When scores go up, assuming no cheating is involved, people tend to think it means that a specific teacher or educational practice has helped children to know more and think better than they otherwise would have.
But do we have evidence of this? I haven’t seen any. To show that improved test scores actually indicate a more knowledgeable and skilled child, we need at least three kinds of evidence.
First, we need evidence that when a child scores better than she has in the past, her knowledge or skills extend beyond the specific items on the test. So far, the evidence has not shown this. In most states where scores have given a different type of test, the same children don’t show similar improvement. As one school principal I know said to me, “One of my teachers reported that her students had particular trouble with questions that involved reading a menu. Her solution was to include menu items in the weeks of school work leading up to the test”. Needless to say familiarity with menus was not the real problem. What those children needed was not more time practicing menu questions, but instead, more skills reading unfamiliar material, understanding a new domain by reading about it, and how to navigate new literary formats. Her students may well have improved on the next round of tests, but that wouldn’t necessarily mean they had actually become better readers. Here’s another way to think of this issue. A child’s temperature is a pretty good measure of the absence or presence of the flu. It tells you that a child is sick, and/or it predicts that that the kid with the fever will feel bad within hours. If you give aspirin to someone with a fever their temperature will go down, but you won’t actually do anything to change the virus within them. There are ways to raise a child’s test score that do little more than giving a person with a fever aspirin.
Second, it would be good to know that when children’s test scores improve, their academic performance in non-test settings also improves. In other words, we’d need evidence that the teacher whose students regularly get better scores than predicted by their earlier scores are also become better thinkers and learners more generally. For instance, we’d need to see that children who test better than we expected them to on reading comprehension items also choose more complex books, use books in a more sophisticated way to form opinions, and speak in more literate and authoritative ways. There are virtually no data to show this.
Third, even in the absence of these two kinds of research, it would be good to know that improving a child’s test score actually improved their life outcome. Research has established that good test scores can cause good things to happen- a good score might qualify a student for an enriched academic opportunity, or a scholarship at the state university (as it does in Massachusetts). For instance, if the children in, say, Ms. Good’s fourth grade class showed more improvement from 3rd grade than the students in Ms. Bad’s fourth grade class, it would be useful to know if, 15 years later, Ms. Good’s students had better jobs, did better at their jobs, found more life satisfaction, and were more conscientious voters than children in Ms. Bad’s class. It would be equally important to show that the children in Ms. Good’s class had better life outcomes than the students in Mr. Alsogood’s class, where children’s scores didn’t go up, but other good things were happening (for instance children were engaged, working hard, and reading a lot). That is, whatever a teacher does to improve student’s scores should also predict (not just cause) a better chance at a good life.
Until we have more data showing that improving test scores actually teaches students to think well, or that an improved test score predicts better life outcomes, we’re all willfully looking away from the Emperor’s nakedness.
While we try to come up with measures that tell us something about individual children, their teachers, or our schools, we’re better off using no tests than ones which have unintended bad effects, and haven’t yet been shown to measure anything meaningful.