More Evidence for a Pain-Related Description of dACC
Selective voxels in the dACC tend to be selective for pain
Posted Jan 25, 2016
[Note: please be sure to read the third paragraph which clears up what appears to be a major source of confusion about our paper]
Since our last blog response to Tal Yarkoni (TY), there have been three new responses from TY, Tor Wager (TW), and Alex Shackman (AS). These responses have given us a lot to think about and have led us to run additional analyses that we (Lieberman & Eisenberger, hereafter L&E) believe clarify, extend, and ultimately strengthen our original claims. While we stand by all of the analyses in the PNAS paper, we wish we had thought to do these analyses previously and included them in the paper.
There is a lot we disagree with in some of the latest blogs, but as we are interested in moving forward, we want to start, again, with areas where there seems to be some agreement, then move on to our new analyses, and then on to a discussion of four issues: (a) The relation of pain and fear in the Neurosynth database (b) Do z-scores in Neurosynth inform us about reverse inference? (c) Empirical priors and (d) Can we ever say a region of the brain has a function? We want to be clear that this blog will be our last comment on all of this. Between the previous post and this one, we feel that we have clarified all that we need to in order to demonstrate that our conclusions are sound. We anticipate that those who have written posts already will continue to disagree with us, but we hope others will find this useful.
But before jumping into the main text of this blog, we wanted to clarify an important point that will be elaborated upon later in this blog. From the analyses in our PNAS paper, we do not think that when we see dACC activity, that it necessarily implies the person is experiencing pain. To make that claim, one would need to generate posterior probabilities based on real world empirical priors that don’t exist (neither the Neurosynth prior of .50 or the 3.5% prevalence of pain in Neurosynth abstracts provide this information as the latter reflects what is studied frequently, not what occurs frequently in general). But this was not the kind of claim we were making. Our claim was much simpler: There is reliable evidence based on z-scores from reverse inference maps that pain is associated with much of the dACC. In contrast, across most of the dACC there is much less evidence, based on z-scores from reverse inference maps, that executive, conflict, and salience processes are reliably associated with dACC. These results suggest that an account of dACC function should focus more on pain processes than the cognitive processes generally focused on. Our claim is about building the best account of dACC function, not predicting the process present in a particular study or assuming that each dACC neuron does the same thing. Below we consider several additional accounts of dACC function to make this claim more comprehensive.
Areas of agreement
Although TY points out that he disagrees with almost everything we said in our first blog, he also peppers his latest blog post with quotes from us or restatements of claims from us with which he explicitly agrees (or has no problem with). We think these are worth highlighting because we think these are some of the most important claims of our paper.
We wrote: “The conclusion from the Neurosynth reverse inference maps is unequivocal: The dACC is involved in pain processing. When only forward inference data were available, it was reasonable to make the claim that perhaps dACC was not involved in pain per se, but that pain processing could be reduced to the dACC’s “real” function, such as executive processes, conflict detection, or salience responses to painful stimuli. The reverse inference maps do not support any of these accounts that attempt to reduce pain to more generic cognitive processes.”
TY wrote in response: “This claim does indeed seem to me largely unobjectionable.”
We wrote: “For the terms executive and conflict, our Figure 3 in the PNAS paper shows a tiny bit of dACC. We think the more comprehensive figures we’ve included here continue to tell the same story. If someone wants to tell the conflict story of why pain activates the dACC, we think there should be evidence of widespread robust reverse inference mappings from the dACC to conflict. But the evidence for such a claim just isn’t there. Whatever else you think about the rest of our statistics and claims, this should give a lot of folks pause, because this is not what almost any of us would have expected to see in these reverse inference maps (including us).”
TY wrote in response: “No objections here”
There were some other paraphrasing instances of agreement as well. For instance TY wrote:
And “If L&E had asked me, “hey, do you think Neurosynth supports saying that dACC activation is a good marker of ‘salience’?”, I would have said “no, of course not.”
And in a separate section he wrote:
“If what they mean is something like “on average, taking the average of all voxels in dACC, there’s more evidence of a statistical association between pain and dACC than pain and conflict monitoring”, then I’m fine with that.” [Note: we assume that the last phrase is miswritten and that TY meant “conflict monitoring and dACC”]
Given that all of these claims where there are areas of agreement depend on interpreting the z-score maps provided by Neurosynth as evidence (or absence of evidence) that a term is a reasonable reverse inference target we take as important areas of agreement:
- z-scores from Neurosynth do provide evidence about whether particular voxels can be plausibly attributed, via reverse inference, to a particular function. There can be multiple terms that show significant z-scores for a voxel and all such terms are plausible functions to attribute to that voxel.
- There is very little evidence from reverse inference z-scores that executive, conflict, and salience processes are good reverse inference targets for dACC activation. Note that we say ‘little’, not ‘no’ evidence, just as we did in our paper, because there is some dACC reverse evidence for conflict, but it is modest.
- There is evidence from z-scores that pain processes are good reverse inference targets for a large portion of dACC voxels.
If we can agree on these points, I think we agree on most of what we cared about in our paper.
New Neurosynth analyses
The last statement from TY above (“If what they mean is something like…”) made us realize there was a different way to approach the conclusions we had reached in the PNAS paper. As we have said, part of what startled us when we looked at the reverse inference maps for pain, executive, conflict, and salience a few years ago was how widespread the dACC coverage was for pain relative to the other terms. We tried to capture this by looking at 8 voxels distributed across the dACC. Perhaps this was not the best way to quantify what we were seeing and it was not sensitive to two reasonable issues raised by TY and AS. First, we only looked at the midline, an issue mentioned by TY. Second, we defined our dACC boundaries using a non-probabilistic atlas and thus we could not indicate the confidence that the voxels under consideration were really dACC voxels, an issue raised by AS.
What we have done in our new analyses is defined a dACC mask using the Harvard-Oxford probabilistic atlas (hereafter H-O) and then examined the percentage of voxels in the dACC for which different terms are a reasonable reverse inference target based on the Neurosynth reverse inference maps. To create the H-O atlas, they took T1-weighted images from dozens of brains (bit.ly/1RMTAzp). Various regions of interest (e.g. ACC) were identified on individual brains prior to any transformations. Then each scan was registered into MNI space. At this point, they could determine for each voxel in MNI space, how many individual brains had been tagged with a particular label in the first step. So if 75% of the brains had a particular coordinate labeled as ACC, then that voxel would be rated as having a 75% likelihood of being ACC in any new scan registered into MNI space.
Using this atlas we were able to define dACC masks (-8 ≤ x ≤ 8; 0 ≤ y ≤ 30) for which the voxels were 25%, 35%, 50%, or 75% likely to be in the dACC (see figure above). One could argue that only voxels that are at least 50% or 75% likely to actually be dACC voxels should go into our ROIs but visual inspection suggested these ROI masks looked very similar to what we had already used in the PNAS paper that led to the reply from AS, so we went with a more liberal 35% mask for our analyses presented here.
Note that the results are qualitatively the same across all the different masks. The only major difference we saw is that as we moved from lower confidence dACC masks (25%) to higher confidence dACC masks (75%), the percentage of dACC voxels associated, via reverse inference, with affective terms increased (e.g. pain +9%; fear +12%; negative affect +6%) and the percentage of dACC voxels associated with cognitive terms decreased (e.g. conflict -12%; error -6%). Thus, as we increase our confidence that a particular voxel is indeed in the dACC, it is more likely to be associated with an affective process and less likely to be associated with a cognitive process. To put this another way, those voxels that are associated with cognitive processes in the dACC tend to be the voxels we should have the least confidence are actually in the dACC.
The analyses presented below are all using the 35% mask. We initially thought to look at our four main categories of interest in the PNAS paper (pain, executive, conflict, salience) along with those raised by TY as alternatives we should have considered (fear, autonomic, reward). Ultimately we decided to include more terms in our analyses so that we might meet the standard for selectivity given by given by TY in his latest blog:
"A brain region can be said to be ‘selective’ for a particular function if it (i) shows a robust association with that function, (ii) shows a negligible association with all other readily available alternatives, and (iii) the authors have done due diligence in ensuring that the major candidate functions proposed in the literature are well represented in their analysis."
We think this definition goes beyond the way in which many researchers have used this term in the past (i.e. in MVPA papers), but we thought it would be worthwhile to see what happens when we apply this definition to our analyses. We should note that we interpret the word ‘association’ in this definition to refer only to associations identified in the reverse inference maps, not what is observed in the forward inference maps. As a result, we attempted to do our ‘due diligence in ensuring that the major candidate functions proposed in the literature are well represented in their analysis’. Thus, we now have a list of 14 terms that covers every dACC account we are aware of from over the years. Our list of terms include:
pain, attention, autonomic, avoidance, conflict, emotion, error, executive, fear, negative affect, response inhibition, response selection, reward, and salience.
We believe this to be a pretty comprehensive list of terms and hope that if we’ve missed any, they have a reasonable synonym on the list likely to yield similar effects.
On to the analyses. The first thing we did was perform a count of the number of voxels in the 35% mask. There were 1110 voxels that the H-O atlas was at least 35% confident were dACC voxels. Of these, 947 voxels (or 85.3%) appear in the reverse inference map for pain (using the standard Neurosynth significance level of p<.01, FDR corrected). Of the 13 other terms, none covered even 20% of the dACC voxels (see figure below). The chi square comparison of pain vs. any other term was highly significant. All X2>975.278, p’s<.00001, d’s>5.38. These tell us that pain is far more ubiquitous of a reverse inference explanation across dACC voxels than any of the 13 other terms.
The above analysis does not get to the heart of the selectivity issue as characterized in TY’s definition because the same voxel can show appear for multiple terms and thus not indicate selectivity of one term over others. Thus, we next assessed how many voxels in the dACC seemed selective for anything such that they showed up for any one term’s reverse inference map, but did not appear in any of the other 13 terms’ reverse inference maps. One might imagine that with 14 terms, that almost no voxels in the dACC would show selectivity by this definition – any voxel that is significant for just two of the terms is eliminated from this analysis. Despite the high hurdle for selectivity here, 477 voxels out of the 1110 (43%) in the dACC appeared in just one of the 14 reverse inference maps. These 477 voxels would seem then to meet the bar for selectivity set by TY’s definition. Of the 477 dACC voxels that are selective for a single term (out of the 14 terms considered), 91.2% were selective for the term pain.
In all, 435 of the 477 selective voxels were present only in the pain reverse inference map and not in any of the reverse inference maps for the other 13 terms. The only other term that had >10 selective voxels associated with it was reward, at 30 voxels (we alluded to this in the original paper). Fear is next at 8 voxels, with error at 3 voxels, and conflict at 1 voxel. Pain-selective dACC voxels are more than an order of magnitude more common than any other selective dACC voxel type. The chi square comparison of pain vs. any other term for number of selective voxels in the dACC was highly significant. All X2>446.203, p’s<.00001, d’s>1.64. These results tell us that among dACC voxels that show evidence of selectivity, they are far more likely to be related to pain than to any of the other 13 terms.
Two very important caveats here:
1) Since 477 voxels were selective for one term, this means that 633 dACC voxels were not selective for any one term. Although we think we are now using a very high bar for selectivity here, higher than any we’ve seen in the literature, it is clear that with this high bar, less than half of dACC voxels are showing selectivity. From this perspective it is an overreach to say that “the dACC” is uniformly selective for pain relative to all other 13 accounts of dACC function we are currently considering. Two responses to this caveat. First, using just the four terms we initially considered (pain, executive, conflict, and salience), 823 voxels were selective under the current definition (i.e. 74.1% of dACC voxels) and of these, 811 were selective for pain (which would have been 98.5% of the selective voxels). Thus, in the context of categories considered in our PNAS paper, our claim of selectivity for pain relative to executive, conflict, and salience processes was reasonable. Second, given all the responses we’ve seen about the dACC being too generic or multi-faceted we think it is pretty impressive that almost half of dACC voxels are selective and of these, almost all of them are selective for pain.
2) One might look at these analyses and think it’s not fair to compare each term against the other 13 because some of these are in overlapping categories. For instance, we included conflict and error, which are distinct, yet overlapping accounts of dACC. If they appeared in the same voxels as each other, they would knock those voxels out of the selectivity analysis above. To address this, below we have a figure showing the comparison of the maps for pain and each other term alone – so each term can demonstrate how many voxels show up for its reverse inference map but not pain’s. The orange bars in each two bar pair below shows the percentage of dACC voxels associated with each term when only the pain-associated voxels are removed. Thus, terms like error and conflict are not competing against each other here. (The blue bars show how many voxels show up in the pain’s reverse inference map, but not for the other term in the comparison)
As is evident, this analysis does not show any other term doing particularly well when pitted 1-on-1 against pain. Apart from reward, for which 2.7% of dACC voxels appear in its map but not in pain’s map, no other term gets above 1.1%. In contrast, in these analyses pain consistently gets above 65% of all dACC voxels after removing those for any one other term. While we indicated in the PNAS paper that reward does indeed show stronger effects than pain in the anteroventral portion of dACC, we thought it would be worth showing this a bit more clearly. If we had used an angled boundary (dashed green line), as some others have, to distinguish dACC from rACC, reward’s reverse inference map might be largely absent from dACC. It’s pretty clear in this figure that the reverse inference effect for reward is largely part of a more rostral ACC cluster.
The conclusions from our current analyses reaffirm the general point made in the PNAS paper. If we are going to talk about the function of dACC as countless papers have over the past 20 years, pain is the only function that the dACC appears selective for over more than a handful of voxels. We used TY’s definition of selectivity (with the assumption that association refers to reverse inference association). Thus, we tried to create a more exhaustive list of terms that are reasonable accounts of dACC. We determined what percentage of dACC voxels showed a reverse inference association with each of 14 terms. We then determined, of these voxels, how many only showed a reverse inference association with 1 term and none of the other 13 terms. To summarize our findings:
- Of the 1110 dACC voxels, 43% (i.e. 477 voxels) met the above criteria for selectivity (appearing in only 1 of the 14 reverse inference maps).
- Of the 477 voxels that were selective, 91.2% (i.e. 435 voxels) were selective for pain.
- Thus, a sizable portion of the dACC can be described by a single term from among this large list of historically plausible terms.
- Of the sizable portion of dACC voxels that can be described by a single term from this long list of reasonable accounts, almost all of these show up in the reverse inference map for pain and for none of the other 13 terms.
Pain and Fear
We have argued distress-related affect might be the umbrella account for dACC processes, with the dACC handling the distress-related aspects of pain (Rainville et al. 1997). Decades old lesion work suggests that the dACC plays a key role in the distress of physical pain as well as in anxiety (Foltz & White, 1962; Tow & Whitty, 1953). As has been pointed out, there are a non-trivial number of dACC voxels that show up in the reverse inference map for fear (12.2% in our dACC mask). We believe that pain and fear are conceptually related because most of what we fear are things that might cause us pain (physically, socially, or emotionally). But in the context of Neurosynth, the relationship is much more direct. Many neuroimaging studies of fear are fear conditioning studies that use pain (e.g. shock) as the unconditioned stimulus. These studies almost never use the word ‘pain’ anywhere and thus are not tagged for pain in Neurosynth, but may be introducing pain-specific effects into the reverse inference maps for fear.
To examine this possibility, we manually inspected the first 50 fMRI studies that show up in Neurosynth for the term fear. We found that 50% of these studies used pain manipulations. In order to see whether these pain manipulations might be driving the dACC signal in the reverse inference map for fear, we counted how many studies with dACC activations had pain manipulations and how many studies without dACC activations had pain manipulations. As can be see in the figure below, a sizeable majority of fear studies (71%) that produce a dACC response use pain manipulations, whereas a sizeable majority of fear studies (69%) that do not produce a dACC response do not use pain manipulations. The chi square comparison of this 2x2 was highly significant; X2>8.013, p<.006, d=0.87. This result suggests that fear studies that include a pain manipulation are more likely to produce a dACC response.
If one considers the possibility that the dACC response to fear in Neurosynth’s reverse inference map is (a) conceptually linked to dACC responses to pain or is (b) literally due to pain manipulations activating the dACC in fear studies, then it is reasonable to combine the dACC responses to pain and fear. Although we don’t want to make too much of these analyses, when pain and fear are combined into a single combined ROI (hereafter, pain+fear), we find that 566 of the 1110 dACC voxels show selectivity for one of the 13 terms. Thus, 51% of dACC voxels are selective under these conditions. Furthermore, 532 of the 566 selective voxels are selective for pain+fear. In other words, 94% of selective dACC voxels in this analysis are selective for pain+fear. Moreover, 48% of all dACC voxels are selective for pain+fear.
In summary, if we treat pain and fear as being part of a single construct as far as the dACC is concerned, we see that nearly half of all dACC voxels are selective for this construct and nearly all of the dACC voxels that are selective for anything are selective for this construct. As in our main analyses in the previous section, no other term besides reward (5% of selective voxels here) garners even 1% of the selective voxels in the dACC.
Do z-scores in Neurosynth inform us about reverse inference?
We think it is unambiguously the case that z-scores in Neurosynth tell something important about reverse inference. Thus, one of the more unexpected aspects of the exchange over our PNAS paper is that TY and TW, creators of Neurosynth, seem to be suggesting that almost nothing can be learned about reverse inference from the z-scores and that we should focus primarily on the posterior probabilities. For instance, TY wrote:
“I explained why one cannot obtain support for a reverse inference using z-scores or p-values. Reverse inference is inherently a Bayesian notion, and makes sense only if you’re willing to talk about prior and posterior probabilities.”
We find this odd because when one uses the Neurosynth web interface and looks at any term, there is a single button on the screen labeled “reverse inference”. When you click on this button it brings up a heat map that, given their labeling scheme, we can only assume is meant to tell us something about reverse inference. This heat map is a heat map of reverse inference z-scores, not posterior probabilities. Similarly, if you download the reverse inference map for any term, it is a map of z-scores, not posterior probabilities. Despite being non-Bayesian, these z-scores are what TY and TW used to populate their “reverse inference” maps. If these don’t tell us about reverse inference, then it is very strange that the only reverse inference button in the interface leads to these z-scores.
TY has also written about the value of the z-scores from Neurosynth in multiple places that seem to contradict the above claim (“one cannot obtain support…”). First we have the text from the Neurosynth FAQ:
“Reverse inference map: z-scores corresponding to the likelihood that a term is used in a study given the presence of reported activation (i.e., P(Term|Activation))”
That sounds to us like the z-score is telling us something about reverse inference. Here are excerpts of what TY wrote on Google+ where he graciously answers lots of user questions about Neurosynth:
“The z-score is a measure of confidence in the statistical association; the posterior probability is a measure of effect size. In general, I recommend paying more attention to the former, because the latter is subject to sample-size related noise. A term with fewer studies included in the meta-analysis will have higher variability, which will translate into more extreme posterior probabilities. However, a term with fewer studies will also produce *less* extreme p/z values, other things being equal. So if you're trying to make a claim of the form "it's likely that function F is associated with activity in region R", you're probably better off basing that on the z-score. [emphasis added]”
This statement is flat out inconsistent with his claim above that “one cannon obtain support for a reverse inference using z-scores.” In his blog he also wrote this of z-scores:
“All it tells us is that, given all the data we have, it’s very unlikely that there’s exactly zero association between a term and a region.”
Despite the pejorative phrasing, we think “all this tells us” is pretty amazing since we had no way of doing this before databases like Neurosynth. This is a really important thing to know, especially when it is combined with other analyses suggesting that for other terms there isn’t evidence of association between the term and the region. Finally, TY writes:
“If one’s goal is simply to say something like “we think that the temporoparietal junction is associated with biological motion and theory of mind,” or “evidence suggests that the parahippocampal cortex is associated with spatial navigation,” I don’t see anything wrong with basing that claim on Neurosynth z-score maps.”
We think this is exactly the claim we’re making along with showing that we are most justified in making pain-related claims about dACC function than claims for other terms. In the PNAS paper we did this by comparing terms that had non-significant z-scores (executive, conflict, salience) to a term that did (pain). While these don’t show that the effect sizes are larger for pain than the other terms (which was never our goal), they do show that we can be more confident that there is some real association between pain and dACC than between the other three terms and dACC. We think this is a valuable contribution. In the current analyses, we took a different approach, counting the number of voxels that show some reverse inference association for one and only one of 14 terms. Again, most of the voxels in the dACC that meet these criteria were selective for pain.
TW gives some nice detail in his blog response about how the z-score is computed, actually starting out as a chi-square:
“it compares the frequency of activation for one target term (“pain”) against the base rate of activation for the other studies (“not pain”). Formally, it compares P(A|pain) to P(A|not pain) using a chi-square test. Thus, it tells us about preference, but not specificity relative to other potential states.”
We agree that the z-score for pain does not do the job by itself. But if we also know P(A|motor) and P(A|not motor) for the same coordinates, this allows us to assess whether this activation is more selective for pain than motor. The larger the z-score, the more confidence we have that P(A|term) is greater than P(A|not term). The comparison of these z-scores across terms (zpain vs. zmotor) tells us something about whether we should have greater confidence that one of these terms is associated with the activity in the region of interest than the other term.
Finally, we have now compared the posterior probabilities for the terms pain, executive, conflict, and salience using the 8 voxels we focused on in our PNAS paper. For instance, we compared the posterior probabilities for pain (using the 8 posterior probabilities for pain that came from 8 different activation points) with the posterior probabilities for executive (using the 8 posterior probabilities for executive) using a repeated measures t-test. For pain vs. each of the other three terms, pain posterior probabilities were significantly higher, t’s > 5.92, p’s<.0003, d’s > 4.47. Moreover, even when comparing pain to fear and autonomic, the posterior probabilities for pain are significantly higher, t’s > 2.92, p=.03, d’s > 2.21. We have never thought that comparing posterior probabilities was essential to making our point, but this is at least some evidence that the effect is there.
Perhaps it shouldn’t be surprising that we see the same thing with posterior probabilities that we saw with z-scores given that, at least with the data we were looking at, the two sets of statistics were highly related. Specifically, the correlation of all of the posterior probabilities and z-scores for the terms of interest in the 8 locations examined in our PNAS paper was r=.86. Thus, while there might be some conceptual daylight between these measures, functionally they were providing roughly the same information in our analyses. This similarity can be seen in the figure below that plots z-scores against posterior probabilities for pain, executive, conflict, and salience from the 8 locations in our PNAS paper. One can also see that the 7 highest posterior probabilities and the 7 highest z-scores all come from pain. Note that the curvilinear relationship is likely due to posterior probabilities being constrained to an upper bound of 1.0.
We’ve already said quite a bit about selectivity in our previous blog. We just want to say a few more things. One is there is no universally agreed upon definition of selectivity (TW describes it as “vaguely defined”). People have definitions, but not everyone has the same one. The implication is that we each need to say what we mean by selectivity when we use the term (something that almost no papers that use this term do, including our PNAS paper). We will certainly be more careful about this in the future, but we should respect different researchers’ definitions when they give them and not treat them as having a bad or incoherent definition just because it’s different than our own. We have now seen at least three definitions of selectivity in dACC voxels, all of which are reasonable:
SelectivityL&E: dACC voxels are selective for pain, if pain is a more reliable source of dACC activation than the other terms of interest (executive, conflict, salience).
SelectivityTY: dACC voxels can be said to be ‘selective’ for a particular function if it (i) shows a robust association with that function, (ii) shows a negligible association with all other readily available alternatives, and (iii) the authors have done due diligence in ensuring that the major candidate functions proposed in the literature are well represented in their analysis.
SelectivityTW: dACC voxels are selective for a particular function if the voxel is activated by that function and “not activated by other things”
We think TW’s definition is defensible, but it probably rules out calling anything selective from fMRI analyses as there are probably few to no voxels in the brain that show activation to one and only one process (i.e. only appearing in a single forward inference map). We think our definition and TY’s definition are both more practical. We think ours is implicit in most MVPA studies discussing selectivity to date and we think TY’s represents a higher bar, but an interesting bar, and one that really requires tools like Neurosynth, rather than MVPA, to consider.
In TY’s latest blog, he suggests one of the problems with our conclusions is that our use of posterior probabilities is misleading. The posterior probabilities for pain are around .80 whereas the posterior probabilities for other terms examined in the paper tend to be between .50 and .60 (where .50 is essentially a null effect). We think these differences (and especially the associated z-score differences) tell us something about the likely functions of dACC. However, TY implies that we think that based on these effects, if a new Neurosynth study with dACC activity was selected randomly, we could predict it would be a pain study. While we can see why TY might think we’d believe this, we never made this claim and do not in fact believe this.
TY points out that the .80 posterior probability for pain depends on starting out with the .50 prior that Neurosynth assumes for each term. In no way does a .80 posterior probability in Neurosynth imply that 80% of the studies with dACC activations were pain studies. In fact, we already pointed this out in our PNAS paper:
"A posterior probability is akin to an effect size, although not a directly interpretable one, because the Bayesian prior for each term was normed to 0.50. Thus, a posterior probability of 0.82 is likely a significantly larger effect size than another of 0.56; however, due to norming, one cannot say that the 0.82 implies that there is an 82% chance that an activation came from a study with a particular psychological term."
TY goes on then to discuss the empirical priors for pain and other terms. Because pain appears in the abstract of 3.5% of all the studies in the Neurosynth database and motor appears in the abstract of 18% of all studies in the Neurosynth database, if these (.03 and .18) are used as the empirical priors for each term (instead of .50), motor ends up with higher posterior probabilities than pain.
We completely agree that if you see a study in the Neurosynth database with a dACC activation, it is more likely to come from a motor study than a pain study. Yet, we think this is almost entirely beside the point. We aren’t interested in the distribution of studies in the Neurosynth database per se. We are interested in trying to draw conclusions about the likely function(s) of the dACC in the real world. That there are more motor studies than pain studies in the Neurosynth database speaks only to scientists’ past research priorities and perhaps the greater ease with which a motor study can be run compared with the difficulty of running a pain study.
To make clear just how irrelevant this difference in the Neurosynth-based prior is, consider the following example. Imagine a database with only pain and motor studies. Suppose there are 100 pain studies and 1,000,000 motor studies in the database. Further imagine that 100% of the pain studies produce dACC activity in a particular voxel and that only 1% of the motor studies produce dACC activity in the same voxel. If we were to randomly draw a study from this database that showed activity in this dACC voxel, it would be 100 times more likely to be a motor study than a pain study. Nevertheless, any reasonable person would look at these results and conclude that this spot in the dACC is probably involved in pain, but not involved with motor processes. The chi-square would support this conclusion.
We understand that 3.5% and 18% are in some sense empirical priors for pain and motor, respectively, in the context of Neurosynth, but they aren’t real world empirical priors (and TY points this out later in his blog). We think that TY’s decision to set all priors to .50 when he created Neurosynth was a really good idea because it avoids allowing effects to be driven by the kinds of studies that happen to be better represented in the database.
In TY’s first blog he gave a great explanation of how to actually think about posterior probabilities. He wrote:
"The strict interpretation of a posterior probability of 80% for pain in a dACC voxel is that, if we were to take 11,000 published fMRI studies and pretend that exactly 50% of them included the term ‘pain’ in their abstracts, the presence of activation in the voxel in question should increase our estimate of the likelihood of the term ‘pain’ occurring from 50% to 80%."
So lets play out this example a bit. Suppose we have 2,000 studies in a hypothetical Neurosynth database, instead of 11,000. By setting the prior for pain to .50, we are saying “imagine that 1,000 of the 2,000 studies have the term pain in the abstract and the other 1,000 do not”. Further imagine that across these 2,000 studies, 1,000 of them have dACC activity in a voxel of interest (e.g. coordinates 0, 18, 30). A posterior probability for pain of .81 at this voxel would imply that we should expect about 810 out of the 1000 studies with dACC in this sample (or a new set of studies with the same pain/no-pain distribution) to have pain as a term and about 190 out of the 1000 studies with dACC in this sample to not have pain as a term. In contrast, if motor has a posterior probability of .51 for this voxel, then we should expect about 510 out of the 1000 studies with dACC in this sample to have motor as a term and about 490 out of the 1000 studies with dACC in this sample to not have motor as a term. Although pain and motor were not directly compared in these analyses, we think these two analyses suggest that pain is a better account of activity at this voxel than motor processes. This is also reflected in the z-scores at 0, 18, 30 for pain (Z=9.90) and motor (Z=0.21).
TY also writes the following:
“The interesting thing about all this is that, no matter what prior you choose for any given term, the Neurosynth z-score will never change. That’s because the z-score is a frequentist measure of statistical association between term occurrence and voxel activation. All it tells us is that, given all the data we have, it’s very unlikely that there’s exactly zero association between a term and a region. This may or may not be interesting (I would argue that it’s not, but that’s for a different post), but it certainly doesn’t license a reverse inference like “dACC activation suggests that pain is present”. To draw the latter claim, you have to use a Bayesian framework and pick some sensible priors. No priors, no reverse inference.”
This still makes little sense to us. First, as far as we can tell we never wrote the words TY appears to attribute to us here ("dACC activation suggests that pain is present") and that's because we have not and do not endorse this view. Additionally, we understand that without a Bayesian framework you don’t get posterior probabilities, which provides an estimate of the strength of the reverse inference effect. However, the z-score certainly seems to tell us whether there is a non-zero reverse inference effect. Thus, the z-score is indeed telling us something of interest about reverse inference. If there are 14 accounts of the dACC and only 1 of the 14 accounts has a significant z-score in their reverse inference maps for a particular voxel then we have definitely learned something about the function of that voxel without referring to posterior probabilities at all.
Can we ever say a region of the brain has a function?
TW questioned the premise of our paper altogether, suggesting:
“We should not be looking for a unified explanation for dACC activity, unless it is to describe a collection of diverse processes. Trying to find the “best interpretation” for a collection of 550 million neurons is misleading, because it invites us to make psychological inferences based on brain activity that are not warranted. By analogy, it is like trying to guess whether a person is a Republican or a Democrat based on his or her home state. The “best interpretation” of voters who live in Texas is that they are Republican. You would be right to guess Republican, but you would only be right 57 percent of the time.”
This is really a philosophy of science issue regarding units/levels of analysis. The same issue comes up in social psychology when we say “under particular conditions, people will tend to show conformity effects.” This does not imply that every person put in that situation will show those effects, but rather that there is a central tendency that can be statistically distinguished from noise. Just because there are some people who do not conform, it does not mean we cannot talk about what people in general do, in a useful way.
TW’s position is philosophically defensible, however, it largely leads to the conclusion that fMRI can pretty much never identify any psychological function within any brain region, because every single voxel contains about 5.5 million neurons (Logothetis, 2008) and there is probably no region where 100% of these neurons are invoked by a single function/process and by no other function/process. But scientists have clearly found utility in trying to describe, say, the hippocampus in terms of a particular function despite it having millions of neurons that do not all do the same thing. We’re not suggesting any particular function is the final description of hippocampal function, but we are suggesting it isn’t a pointless endeavor to posit a general function for the hippocampus that will be debated, refined, and updated over time.
Lets turn to TW’s example of guessing whether a randomly selected Texan is a Republican based on the fact that 57% of Texans voted for Romney in 2012 (vs 41% for Obama). We think this is a great example, but it doesn’t capture the question we are actually interested in. If we are equating individual Texans with dACC neurons and the state of Texas with the dACC as a whole, then our real question isn’t whether we can guess whether a particular person is Republican (though you’d be crazy not to guess Republican if you were forced to bet). Instead, our question is more akin to “Does Texas function as a Republican state, despite the fact that many individuals in that state are not Republicans?” The answer to this question is an emphatic yes. Those 57% who vote Republican have ensured the dominance of Republicans in every arm of government: in the US Senate (100%); US House of Representatives (69%); Texas State Senators (68%); Texas Representatives (65%); and Texas Supreme Court Justice (100%). These numbers are easily high enough to ensure that Texas functions as a Republican state. With 65% or more in both houses of the State legislature, the Republicans can vote through 100% Republican friendly legislation and the State Supreme Court can render Republican friendly decisions, time after time. Perhaps of greatest significance is that those 57% of Texans who vote Republican have sent 100% of Texas’ 38 electoral votes to the Republican candidate for President the last 9 elections in a row. So for the time being, we think it is only sensible to describe Texas as a Republican state as this has great practical value. Each citizen need not be a Republican in order for this to be the case, nor does the fact that there are urban enclaves that lean Democratic undermine this description of the state. (For those who are more computationally savvy, simply consider a connectionist network. If a representation has a small advantage in the weights connecting some nodes, iterative constraint satisfaction processes will turn that small advantage into a large functional advantage in outcomes.)
Do the z-scores from the reverse inference maps tell us the strength of the reverse inference effects? No, but they are correlated .86 in our analyses with the posterior probabilities that do. Do the z-scores from the reverse inference maps tell us where in the brain there is reliable evidence of a non-zero reverse inference association? Absolutely. Can the z-scores thus be used as a tool for reverse inference if we identify voxels that show significant z-scores for one term but not for others of interest? Absolutely.
Do we think every neuron or voxel in the dACC is selective for or even activated by pain? No. Do we think this means that there can be no discussion of the function of the dACC? No. Are most of the voxels in the dACC selective using TY’s definition? No, but about 43% of dACC voxels show selectivity using the 14 terms we considered (which means those voxels appeared in one and only one of the 14 reverse inference maps under consideration).
Of those dACC voxels that are selective, 91% are selective for pain. Is the dACC selective for pain relative to executive, conflict, and salience processes as we argued in the PNAS paper? Absolutely – only 1 voxel of the 477 voxels that show selectivity is selective for any of these three processes. Based on Neurosynth evidence, is more of the dACC selective for pain than for attention, autonomic, avoidance, conflict, emotion, error, executive, fear, negative affect, response inhibition, response selection, reward, and salience? Absolutely. Given that few, including us, would have guessed so much more of the dACC is selective for pain than all of these other accounts, we think our findings are a significant contribution to affective and cognitive neuroscience.