The Causal Replication Framework

Despite recent interest by national funding agencies to promote the replication of effects, there is not yet consensus on what systematic replication is, how replication studies should be conducted, and appropriate metrics for assessing replication success. Our work seeks to address these challenges developing methodological foundations for a replication science.

Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The core of the CRF is based on potential outcomes notation, which has the advantage of identifying clear causal estimands of interest and assumptions (or conditions) for the direct replication of results. Here, a causal estimand is defined as the causal effect of a well-defined treatment-control contrast of a clear target population.

The table below summarizes five assumptions required for direct replication of results across multiple replication studies. The assumptions may be understood broadly as “replication design requirements” (R1-R2), and “individual study design” requirements (A1-A3). Replication design assumptions include treatment and outcome stability (R1) and equivalence in causal estimands (R2). Combined, these two assumptions ensure that the same causal estimand for a well-defined treatment and target population is produced across all studies. Individual study design assumptions include unbiased identification of causal estimands (S1), unbiased estimation of causal estimands (S2), and correct reporting of estimands, estimators, and estimates (S3). These assumptions ensure that a valid research design is used for identifying effects, unbiased analysis approaches are used for estimating effects, and effects are correctly reported (these are standard assumptions in most individual causal studies). Replication failure occurs when one or more of the replications and/or individual study design assumptions are not met.

A key advantage of the CRF is that it is straight-forward to derive different types of research designs for replication, as well as assumptions for these designs to yield valid results. For example, research designs for direct replication examine whether two or more studies with the same well-defined causal estimand yields the same effect. To implement this type of design, the researcher examines the replicability of effects when one or more individual study design assumptions is tested, such as using different research design approaches (e.g. matching versus and RCT) or estimation methods, or asking an independent investigator to reproduce the effect using the same data and code. Well-known examples of direct replication studies include within-study comparisons for evaluating the performance of non-experimental effects in field settings (Lalonde, 1986; Fraker & Maynard, 1987), reanalysis approaches for examining the sensitivity of results to different estimation approaches (Duncan, Engel, Claessens, and Dowsett, 2014), and reproducibility studies using the same data and syntax files (Chang & Li, 2015). Research designs for conceptual replications examine whether studies with potentially different causal estimands yield the same effect (akin to robustness tests in Clemens, 2017). Here, the researcher may test replication assumptions by using multi-site designs where there are systematic differences in participant and setting characteristics across sites, and multi-arm treatment designs when different dosage levels of an intervention is assigned.

Under the CRF, replication failure is not viewed as being inherently “bad” for science, as long as the source of the replication failure can be identified. Prospective research designs can be used to assess the replicability of effects and to evaluate potential sources of effect heterogeneity when study results do not replicate. Here, the researcher introduces planned variation by systematically relaxing one or more design assumptions while trying to meet all other assumptions. If replication failure is observed—and all other assumptions are met—then the researcher may infer that the tested assumption was violated and resulted in treatment effect heterogeneity. In cases where replication assumptions are suspected to not be met, the researcher may incorporate diagnostic measures for assessing empirically the extent to which the assumption was violated in field settings.


Further Reading

Steiner, P. M., Wong, V. C., & Anglin, K. (2019). A causal replication framework for designing and assessing replication efforts. Zeitschrift für Psychologie, 227(4), 280-292. http://dx.doi.org/10.1027/2151-2604/a000385

Wong, V.C., Steiner, P.M., Anglin, K. (under revision). Replication designs for Causal Inference. Working paper.

Wong, V.C. Steiner, P.M. Anglin, K. (under review). Design-Based Approaches to Causal Replication Studies.


Citations

Chang, A., & Li, P. (2015). Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say “Usually Not” (Finance and Economics Discussion Series) [2015-083]. Board of Governors of the Federal Reserve System. https://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf

Clemens, M. A. (2017). THE MEANING OF FAILED REPLICATIONS: A REVIEW AND PROPOSAL. Journal of Economic Surveys, 31(1), 326–342. https://doi.org/10.1111/joes.12139

Duncan, G. J., Engel, M., Claessens, A., & Dowsett, C. J. (2014). Replication and robustness in developmental research. Developmental Psychology, 50(11), 2417–2425. https://doi.org/10.1037/a0037996

Fraker, T., & Maynard, R. (1987). The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Programs. The Journal of Human Resources, 22(2). https://doi.org/10.2307/145902

Lalonde, R. J. (1986). Evaluating the Econometric Evaluations of Training Programs with Experimental Data. The American Economic Review, 76(4), 604–620.

Wong, Vivian C., Peter M. Steiner, and Kylie L. Anglin. (2020). Design-Based Approaches to Causal Replication Studies. (EdWorkingPaper: 20-311). Retrieved from Annenberg Institute at Brown University: https://doi.org/10.26300/xsqw-c323

Previous
Previous

Why Replication Science?

Next
Next

Research Designs for Replications