Replication and Generalizability of Evidence-Based Practices

More credible research is needed, and it needs to be done right.

Posted Jul 23, 2015

By Scott C. Marley

This year’s Division 15 presidential theme is “Impacting Education Pre-K to Gray,” which means our field wishes to positively impact all populations of learners. A clear way to impact education with all populations is to address the issues that have been identified in literature as the “replication crisis.” The beauty of the replication crisis is it provides extensive opportunities for meaningful growth in the field of educational psychology. My position is that educators are often unresponsive to implementing evidence-based practices because of credibility considerations associated with replicability and the related issue of generalizability. There are systematic approaches to addressing these two considerations that are often neglected by producers and consumers of educational research. But first, I provide a concrete example from my schoolteacher days that illustrates the credibility challenges educational psychology faces if as a field it is going to be relevant to “Pre-K to Gray” educational leaders. Then, I provide a rationale for why “more research is needed,” but—more importantly—I suggest that what is needed is more credible research across units, treatments, outcomes and settings.

Early in my career, I was a fourth-grade teacher on a rural American Indian reservation. The children from the community were predominantly second-language learners from low socioeconomic households. The district’s schools were identified as low performing or failing and, in response, the schools were required to take action by adopting “evidence-based practices.” Although I am a proponent of educators using scientific evidence to make decisions, I do consider it important for producers and consumers to recognize the limitations of the literature on evidence-based practices (for relevant discussion see, Marley & Levin, 2011). Without a strong understanding of these limitations, one is likely to fall victim to two forms of "wishful thinking" that can result in unwarranted optimism in the effectiveness of educational interventions.

The first form of wishful thinking is the belief that one intervention can solve very complex educational problems with all populations. This belief coincides with the idea that an intervention can be proven effective rather than supported by the evidence that suggests the effectiveness of an intervention. Proof assumes perfect knowledge of intervention effectiveness whereas a focus on best evidence recognizes that knowledge is always incomplete and subject to change (for relevant discussion see, Guba & Lincoln, 1994). The second form of wishful thinking is the belief that an outcome is the result of a single knowable cause (e.g., “poor reading achievement is due to <insert least favorite flavor> instruction”). According to wishful thinkers, all one has to do is find the magic bullet, attack the single cause with it and the problem is solved. It is that simple!! We should do it now!! Why haven’t we started?! But the solutions to educational problems are almost never obvious or easy.

There are likely many reasons educators are skeptical about the value of educational research, some of which educational psychologists can address. A potential explanation that can be addressed is that educators may informally recognize educational research has credibility issues (Hsieh et al., 2005). For example, after being designated “low performing” my reservation school was remediated by so-called experts who had never taught on a rural reservation. Their primary recommendation was that the school adopt an evidence-based intervention. At the time, I had a sense the context where the evidence supporting the proposed intervention had been gathered differed substantially from the context in which I was teaching. In other words, the generalizability of the evidence-based intervention was questionable. Generalizability of research findings—along with the related issue of replicability—may very well be why “more research is needed” in targeted contexts now more so than ever. However, let’s add to this common phrase found at the end of empirical papers so that it reads “more credible research is needed.” Replication and generalizability are two credibility indicators (for more on other important credibility indicators, see Levin, 1994, 2004) that need more attention before educational researchers, consultants and others get too heavy-handed with their educational recommendations.

The first credibility indicator before making educational recommendations is that findings need to be replicated to determine whether a particular result is robust or an anomaly (for detailed discussion, see Schmidt, 2009). One concern is that the social sciences may have a publication bias associated with publishing positive results that are anomalies. The second concern is that replications are rarely published in education journals (Makel & Plucker, 2014). The lack of replication studies is often attributed to journal editors, reviewers, and granting agencies expecting novelty. If so, focusing on novelty of results could promote reward structures that minimize the value of replication studies (Koole & Lakens, 2012). A lack of replication studies clearly limits how confidently we can make conclusions about the effectiveness of an instructional approach. This limitation is not often mentioned in recommendations that accompany various evidence-based practices.

The second requirement is to carefully examine the generalizability of findings to assure a high degree of external validity before making recommendations to educators (Shadish, Cook, & Campbell, 2002). Concerns can be voiced regarding studies that examine the effectiveness of instructional interventions using samples of college students or other “unique” samples. By “unique,” I mean samples of participants who remembered to bring their parental consent form back with a signature, participants who agree to complete surveys, or other comparable ways in which a sample may differ from populations that exist in nature.

How are producers and consumers of educational research to address the challenges associated with applying findings from educational research? As a field, we can look to the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) for guidance. The standards direct test producers and users to be well aware of the conditions under which valid inferences are most likely to arise from test scores. Several of the standards emphasize that populations are not interchangeable and test producers and users are expected to examine how subjects’ scores perform in a new context. In other words, replications are expected to assure test scores are properly interpreted in contexts that differ substantially from the initial context. A similar approach is warranted with intervention research if the literature base is to be considered credible to the point that practitioners can make informed decisions. Several frameworks have been proposed for enhancing the generalizability of research findings. One such framework would be to incorporate the classic generalizability concerns proposed by Cronbach (Cronbach & Shapiro, 1982) to a replicate and extend model of programmatic research. Doing so would lend much needed credibility to the literature base.    

According to Cronbach’s classic UTOS framework, examinations of generalizability can occur across Units, Treatments, Outcomes and Settings. Each of the framework’s components provides ample problem space for researchers to replicate and extend research findings and for consumers to evaluate the degree of confidence they should hold in an intervention. Perhaps journal editors, reviewers and funding agencies would be more likely to consider publishing and funding studies situated explicitly within a replicate-extend UTOS framework and place less weight on the novelty of the initial ideas and results? If credible research that impacts all populations from “Pre-K to Gray” is to occur, this or comparable systematic approaches to replicating and generalizing results are likely to bear fruit. In the meantime, if readers are interested in joining replication and generalization related discussions at this year’s APA conference in Toronto please consider attending the collaborative programming that Division 15 has with other divisions on the “replications crisis” listed below.

Session Title: The Replication Crisis—What Brought Us Here and Where We Need to Go

Session Type: Symposium

Date: Thu 08/06 10:00AM - 11:50AM

Division/Sponsor: CPG-Central Programming Group; Co-List: 30, 3, 5, 6, 10, 15, 24, 26

Building/Room: Convention Centre/Room 716A South Building-Level 700


American Educational Research Association, American Psychological Association, & National Council on Measurement on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

Cronbach, L. J., & Shapiro, K. (1982). Designing evaluations of educational and social programs. Jossey-Bass Inc Pub.

Guba, E. G., & Lincoln, Y. S. (1994). Competing paradigms in qualitative research. Handbook of Qualitative Research, 2, 163–194.

Hsieh, P., Acee, T., Chung, W.-H., Hsieh, Y.-P., Kim, H., Thomas, G. D., Levin, J. R., Robinson, D. H. (2005). Is educational intervention research on the decline? Journal of Educational Psychology, 97(4), 523.

Koole, S. L., & Lakens, D. (2012). Rewarding replications a sure and simple way to improve psychological science. Perspectives on Psychological Science, 7(6), 608–614.

Levin, J. R. (1994). Crafting educational intervention research that’s both credible and creditable. Educational Psychology Review, 6(3), 231–243.

Levin, J. R. (2004). Random thoughts on the (in) credibility of educational-psychological intervention research. Educational Psychologist, 39(3), 173–184.

Marley, S. C., & Levin, J. R. (2011). When are prescriptive statements in educational research justified? Educational Psychology Review, 23(2), 197–206.

Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13(2), 90.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. ew York, NY: Houghton Mifflin.

More Posts