“As they turned 23, the study found that when compared to their socially slower-moving middle-school peers, they had a 45 percent greater rate of problems resulting from alcohol and marijuana use and a 40 percent higher level of actual use of those substances. They also had a 22 percent greater rate of adult criminal behavior, from theft to assaults.”
This 40% increase seems pretty impressive—sounds like a relative increase. Is it adjusted for known risk factors like gender and family income, and how big is this effect anyway in the absolute? The authors use "step-wise model selection", making the assumption that adverse outcomes, such as substance use and criminality, are already known predictors such as gender and family income. They add additional predictors to the model: is adding “coolness” going to improve the model? Table 1 shows their initial assumption isn’t necessary right, but let's ignore that. Table 4 shows that the effect size of “coolness” on later drug use is much larger than that of gender and income (roughly 2x) after adjustment. Not clear if the correlation coefficients are adjusted to scale, but we’ll let that go also for now. Focusing on the R-square statistic, I see that gender and family income account for 12% of the variance, “coolness” accounts for another 10%.
Phew. Thankfully, subsequent tables are similar enough that even without reading the legends I can glance at them and guess what they were trying to say. This is an example of why statistics departments have a marketing problem. But that's a discussion for another time.
Let’s translate that into a language I can understand.
How well we can predict on an individual level with the three predictors? Studies of this kind do not usually report predictive performances. Nevertheless, we can allow ourselves the liberty to think out loud in a back-of-the-envelop way for just one second. Very roughly, with the worst prognostic factors, (poor, male, “cool”), on an individual basis, we can only improve our odds of detecting, in a particular individual, whether he will have a drug problem at best for by about 30% (i.e. accounting for a “change of variance” of 20-30%). I.e. if I see an adolescent at the age of 13 exhibiting these characteristics, all I can say, at best, is that there is a 30% higher likelihood of him having a drug problem at the age of 20. Given that perhaps 10% of the population has some kind of drug problem, we have a 13% “posterior” probability (probability after the fact that we know he has these high risk traits) of this particular adolescent having a drug problem in his 20s. Not too impressive. A key conceptual difference is obscured: testing a hypothesis (on average, “cool” kids are more likely, though not by too much in the absolute, to abuse drugs in their 20s--true) is not the same thing as evaluating a predictive model (can’t really say much knowing this particular kid is “cool”).
These results suggest that trying to profile problem youths like this in daily life probably has a very high false positive rate. The incredible variance (that 70% variance NOT in the 3 predictors) and the relative rarity of a psychiatric disorder (vs. the much more common phenomenon of non-conforming behavior) make it difficult to predict behavior 10 years down the line on an individual basis. A crude re-reading of this interesting but highly nuanced study reveals that most “cool” kids do not, predictably, become drug addicts. Popular media articles like this one may very well contribute to negative stereotyping.