Assessing the effectiveness of educational interventions relies on quantifying differences between interventions groups over time in a between-within design. Binary outcome variables (e.g., correct responses versus incorrect responses) are often assessed. Widespread approaches use percent correct on assessments, and repeated measures analysis of variance (ANOVA) methods to detect differences between groups. However, this approach is not ideal, as in fact several assumptions are often violated when using this method that can result in less informative and at times biased and spurious findings (Dixon, 2008; Embretson, 1994). An alternative approach is to utilize item response models to detect differences between intervention groups over time. The benefits of item response methodology for intervention research are contrasted with repeated measures ANOVA approaches, using a longitudinal intervention dataset having a between-within design from elementary students learning about mathematical equivalence. The dependent measures of percent correct in repeated measures ANOVA approaches and item responses in item response models are contrasted, as well as the methods for quantifying differences between groups using repeated measures ANOVA approaches and item response models. Second and third grade students who scored below 75% correct on the pretest participated in a 20-minute one-on-one tutoring intervention that focused on mathematical equivalence problems. In conclusion, item response models offer many methodological advantages in the quantification of individual learning and group change over time compared to repeated measure ANOVA approaches based on percent correct outcomes. In particular, the generalized explanatory longitudinal item response model for multidimensional tests (Cho et al., in press) quantifies and tests for differences between intervention conditions, while utilizing the more informative and less problematic metrics of student performance. In addition to being methodologically more sound, these analyses can be performed using the open-source and free program R. Details of the model, as well as information how to run these analyses can be found in Cho et al. (in press). One drawback to preforming IRT analyses is that they do require more technical proficiency on the part of the data analyst than ANOVA approaches. Nevertheless, researchers should strive to adapt this more informative and less biased metric in the evaluation of intervention effectiveness. Generalized Explanatory IRT Model R Code and Select Results are appended. (Contains 3 figures and 1 footnote.
Authors
- Authorizing Institution
- Society for Research on Educational Effectiveness (SREE)
- Education Level
- ['Elementary Education', 'Grade 2', 'Grade 3']
- Peer Reviewed
- F
- Publication Type
- Reports - Research
- Published in
- United States of America
Table of Contents
- Abstract Title Page 1
- Title Authors and Affiliations 1
- Abstract Body 2
- Background Context Purpose Objective Research Question Focus of Study Intervention Program Practice 2
- Statistical Measurement or Econometric Model Usefulness Applicability of Method Problems with ANOVA Approaches and an Item Response Model as its Alternative 3
- IRT ability estimates are more accurate and informative than total scores or percent correct scores which are typical of traditional ANOVA approaches. 3
- Item Response Models More Accurately Quantify Group Differences Over Time Repeated Measures ANOVA vs. Generalized Explanatory Longitudinal Item Response Models independent Normality 4
- Conclusions 6
- Appendices 7
- Appendix A. References 7
- Agresti A. 2002. 7
- Categorical Data Analysis Wiley Series in Probability and Statistics 7
- Second Edition p. 710. Hoboken New Jersey John Wiley and Sons. 7
- Statistical 7
- Theories of Mental Test Scores. 7
- Psychometrika 46 7
- Journal of Applied Measurement 4 7
- British Journal of Mathematical and Statistical Psychology 7
- Journal of Memory and 7
- Language 59 7
- Jaeger T. F. 2008. Categorical data analysis Away from ANOVAs transformation or not and 7
- Journal of Memory and Language 59 7
- Matthews P. Rittle-Johnson B. McEldoon K. Taylor R. 2012. Measure for measure 7
- What combining diverse measures reveals about childrens understanding of the equal sign as an indicator of mathematical equality. 3. 7
- Journal for Research in Mathematics Education 43 7
- Rittle-Johnson B. Matthews P. G. Taylor R. S. McEldoon K. L. 2011. Assessing 7
- Journal of Educational Psychology 103 7
- Appendix B. Figures 8
- Appendix C. Generalized Explanatory IRT Model R Code and Select Results 9