The Inside Story
A study yields strange results, but why — can you solve the mystery?
Posted December 24, 2014
“Oh no! That’s not what I wanted!”
Most researchers have felt this frustration when, after carefully planning and executing a study, the results come out wrong. It’s an ugly feeling. No matter how much we believe we are supposed to learn from failure, it still hurts. We may be depressed for a few days, or angry that the study didn’t give us the findings we wanted.
That failure recently happened to me, and I went through the classic stages of denial (perhaps there was a mechanical problem with the statistical package), anger, and depression, or at least discouragement. But, unhappily, the findings were accurate. The results disappointed us.
It was a project that was part of the “Good Strangers” program, a government effort to help military personnel and police officers work more effectively with civilians. Instead of demanding obedience, which is likely to create resentment and hostility, we wanted to teach ways to gain voluntary compliance and cooperation.
The study my team ran was aimed at changing the mindset of Marines and soldiers so that they tried to get civilians to trust them more at the end of encounters than at the beginning. We thought if we exposed the warfighters to the thinking of experts — military personnel we had identified who had Good Stranger skills — we could have an effect. We ran the Marines and soldiers through a number of scenarios and asked them to make decisions or set up priorities within each scenario, working within a small set of options. We asked them to predict how our experts had ranked the options and we also asked them how they themselves would rank the options.
We got a small effect, so the study wasn’t a total failure. But the effect wasn’t as large as we hoped for. The groups that got feedback about how the experts had ranked the options did somewhat better than the controls, borderline significant.
What went wrong? My colleagues pored over the statistics and couldn’t find an answer.
When I joined their investigation, I told them, “I want to see the data.” They showed me all of their analyses. “No, I want to see the data,” I repeated. They trotted out the data sheets with all of coded responses. “No, I want to see the data.” I was sounding like a broken record.
I explained that I wanted to see the actual booklets that the participants — the soldiers and Marines — had used to write down they rankings and the reasons for those rankings. My colleagues were surprised by this request. It seemed so outdated, so primitive, looking through the booklets themselves, with handwritten comments, but they produced several stacks of booklets for me to pore over.
I started with the control group for the first group of Marines that we ran. I flipped through the pages of the first Marine in that group and — all of a sudden — there it was. The answer. At least I think it was the answer.
I went over to one of my colleagues with my discovery. “You figured it our already?” he exclaimed. He thought I’d be at it for hours, not minutes. When I showed him the evidence, though, he agreed with my diagnosis.
Here is what I spotted. It was in the second of the four scenarios the control participants had worked on. In this scenario, the Marines are moving north during the 2003 invasion of Iraq. The protagonist is a company commander given the task of occupying a small Shiite village, presumed friendly but you could never be sure, in those days. Also, the company was traveling with a TV news crew, and had to protect that crew.
One of the decisions was to rank the priorities before going into the Shiite village. We listed four options, shown below.
The “Good Stranger” choice was [B], “Figure out how to accomplish the mission without a fight.” Before you read any further, do you notice anything odd? See if you can discover what went wrong. This is your chance to solve the mystery.
Look at the left-hand column. This Marine, in the control condition, without getting any feedback, nailed it. He was a natural Good Stranger. He picked option B as his top priority. And look at his rationale: “If possible this is the best option. You want the local populace on your side.” Exactly.
Now look at the right-hand column, where we asked the participants to predict the way the experts would rank the options. (The right-hand column numbers in this example actually combine the responses from this first Marine plus a second Marine in that control group.) Their prediction is that the experts would prefer [C]: “Show the locals who is in charge right from the beginning and there will not be any trouble.” This goal is the opposite of being a Good Stranger. It is going to provoke anger, not trust. One of these two participants predicted that the experts would give the lowest rank to option [B], the Good Stranger response.
So we have a mismatch. The responses from these participants were more in keeping with being a Good Stranger than the responses they predicted for the experts. Why would that be?
Ah, of course. Who are the experts? In all our previous studies using this method, the experts were simply those with more experience and ability. But in this project we were tapping into professional identity, not skills. The military culture emphasizes safety, not being a Good Stranger. The military culture teaches warfighters to treat each civilian as a potential enemy, not as someone to work with cooperatively. So for many of our participants, the “experts” were not the Good Strangers. Just the reverse. And that’s why giving our participants feedback on what the “experts” chose wasn’t helping them. Instead it was confusing them. No wonder we weren’t getting the results we wanted.
And now several bits of information fell into place. When we had interviewed police officers who were Good Strangers, they could identify mentors who had shown them that they didn’t always need to rely on intimidation and coercion. But none of the military personnel we interviewed who were Good Strangers could think of any such mentors.
Another piece of evidence — a pilot study in which the participants wanted to know, “Who are these “experts?” Now we could see that our “experts” were not acting in the way the warfighters expected. And another incident in which we used a scenario in which a Marine risked his life to gain trust. One participant in our study blurted out that this was garbage. No real Marine would ever act like that. We pointed out that the scenario came from an actual incident and the Marine, who was then a Captain, was now a Lieutenant Colonel.
The control group, which didn’t get any feedback, should have given identical ratings for what they predicted the experts would do and what they would do. However, most of the controls gave somewhat different ratings, and now we could see why. A number of them didn’t fully buy into the intimidation culture of their leaders.
Our insight, that the warfighters had a different idea of experts, explained all of these anomalies. And the solution was clear to us. For future groups we carefully described that our experts were skilled in gaining cooperation without antagonizing civilians.
Now we ran additional groups and the responses fell into place. The groups that got expert feedback (along with the new explanation of who was an expert) improved by 28% compared to the controls, significant at the .05 level. The decisions they made were more in keeping with being a Good Stranger. A morning’s worth of training was enough to produce a large shift in professional identity.
Here is what worries me. I would not have spotted the problem, or had the insight, if I just looked at the coded data sheets or the summary statistics. I needed to plunge into the booklets, scrutinizing the rankings and also the written comments. I had to try to imagine what the individual Marines were thinking.
But this type of investigation doesn’t happen very often. The type of coding sheets required for SPSS and other statistical programs would hide this type of exploration. In many studies, the ideal is to have automatic data collection, so the investigator never sees any of the individual data. The lead investigator just sets up the conditions and the statistics are automatically generated. It sounds wonderful, except: where is the chance to examine the thought processes of individuals, to try to get inside their heads, to search for the inside story? Perhaps our eagerness for efficiency is sacrificing insights.