War Stories from the Accountability Battlefield: Part 2 of 2

After working closely with 240 clients, Steve Rees weighs in on CA school data.

Posted Jan 19, 2019

Steve Rees, used with permission
Steve Rees
Source: Steve Rees, used with permission

In Part 1 of this interview, Steve Rees shared faulty conclusions some educators have drawn from data in the California School Dashboard. As president and founder of School Wise Press, Rees is an education data expert with extensive experience helping educators of all roles to make better use of data. After working closely with 240 school district clients over the course of 14 years to make sense of schools’ vital signs, Rees has developed workshops and courses as well as measurement tools to help school and district leaders draw valuable and reliable insight from data. In fact, his company has a new partner – Teachers College at Columbia University – to help them advance the cause.

School accountability components vary state to state (student assessments feature variations even when given in states that are part of the same consortium, results are reported in different ways, etc.). Here Rees continues giving us a glimpse of some California accountability war stories that drive him to improve data use in the field.

Jenny Rankin (JR): Are some of these problems due to the data measurement tools themselves?

Steve Rees (SR):  These tests – in California, students take a test very similar to the tests taken in a dozen other states who are all in the Smarter Balanced Consortium – themselves have limits. I share Jim Popham’s assertion that it is not the tests themselves that cause problems. It is what Popham calls the “abysmal assessment literacy” of educators themselves that lead to misuses, misunderstandings and mistakes in human judgment. It is misunderstanding what the test results mean that is the core problem.

Take one dimension of assessment results: error margin. In California, there is no mention of the error margin in the reports that are mailed home to parents or delivered to classroom teachers. Only the districts lucky enough to have assessment directors on staff (about 1 out of every 8 districts) even see the size of the error margin of students, classrooms, schools and the district. This is a big deal. Here’s why. Each of these tests has between 35 and 45 questions. A student’s answers provide a reasonable estimate of his mastery of those standards that have appeared on the test. But the estimate enables you to say only this much about a student’s score: “Gabriela’s score on her 6th grade math tests was 2490. We’re pretty sure that if she took the test on a Tuesday instead of a Monday that she’d score within 27 points of that 2490 score.” Hiding that uncertainty and imprecision leads people to invest too much meaning in these scores.

But it gets worse. The results of these tests are used to gauge the size of achievement gaps between boys and girls, students getting and not getting lunch subsidies, and students of various ethnic groups. Three problems cloud a clearer understanding of the size and meaning of these gaps. First, the masking of imprecision and uncertainty makes the estimated sizing of the gap impossible for anyone but a statistician to understand. Second, the test itself was not designed to capture all the information about the highest and lowest scoring students. What psychometricians call the “spread” is necessarily limited. This means people are underestimating the size of gaps. Third, California Department of Education has baked a logic error into its accountability system. Their official dashboard compares each ethnic group to the status of all students in each school or district, in effect including an entity in the benchmark it’s being measured against. They should have paired entities for comparison purposes, of course:  boys and girls, free-lunch kids and kids not getting free lunch.

JR: How does flawed evidence do harm when district leaders make big decisions?

SR:  When district leaders sit down to build their plans for the next school year, they take the evidence handed to them by the California Department of Education and decide where they see problems of practice, areas where their vital signs appear to be lagging. Then they decide to spend money and allocate teachers’ precious time to improving things. If they are aiming those resources at “problems” that aren’t really problems at all, and missing more compelling challenges due to flawed evidence, then money and time are wasted, and opportunities to learn for the students who need them are missed.

I would like to see districts build their own bodies of evidence from which they gauge where they stand, relative to others whose kids and communities most like their own. If they viewed the evidence they are handed by the state with a dose of skepticism, it would be a good start. In fact, state law and policy in California has explicitly encouraged districts to embrace local control. If they exercised this local control when measuring their schools’ and their district’s vital signs, they’d be much more likely to make wiser decisions.

JR: Thank you for your time, and for all you do for educators and students.

More Posts