Automated Data Analysis of STEM Education Interventions with DBERlibR

Changsoo Song
University of Nebraska-Lincoln

Need: Discipline-based education research (DBER) aims at improving educational practices in the science, technology, engineering, and mathematics (STEM) disciplines (National Research Council, 2012). DBER scientists pursue evidence-based knowledge and practices to enhance teaching and learning in the STEM disciplines (Henderson et al., 2017) by collecting, integrating, cleaning, and analyzing multi-modal data. The pursuit of evidence-based practices entails continuous development and testing of different learning approaches and repeated collection and analysis of examination/test data. However, it is a daunting task to clean test data (e.g., treating missing values, handling incorrect values and outliers) and perform statistical analyses (e.g., item analysis, repeated measures analysis of variance), particularly when many DBER faculty were trained in a STEM discipline and not as education researchers. These data-heavy processes require multiple steps, some of which are error-prone and time consuming, potentially leading to limited study reproducibility. For example, researchers need to merge and/or bind the data, test assumptions required for parametric techniques, and/or employ non-parametric techniques as necessary (Naskar & Das, 2018). Researchers handle these processes individually using multiple packages/functions because the current statistical data analysis tools are typically a “buffet-style” that requires users to select each individual analysis to run.
Guiding Question: Can we help advance DBER by developing easy-to-use statistical software tools that will streamline data pre-processing, processing, and analysis to reduce time and errors associated with DBER data analyses, and increase overall study reproducibility?

Outcomes: We have developed DBERlibR – an R package to streamline and automate DBER data processing and analysis. The R package reads user-provided data, cleans them, merges/binds multiple data sets (as necessary), checks assumption(s) for specific statistical techniques (as necessary), applies a number of various statistical tests (e.g., one-way analysis of covariance, one-way repeated-measures analysis of variance), and presents and interprets the results all at once. Although the package requires minimal inputs from users once they provide data in a standard format, users are able to adjust various parameters to better fit their studies (e.g., a cut-off criterion to remove cases with too many skipped answers). Short-term outcomes of DBERlibR include saving time on data cleaning and analysis and preventing errors in the results. Long-term potential outcomes include improved research reproducibility.

Broader impacts: Thanks to the ease of test data analysis, more education researchers (especially those who are not familiar with analytic techniques) are expected to utilize advanced statistical techniques more actively to examine the efficacy of their educational interventions on students’ performance. DBERlibR provides the most frequently used analytic techniques such as item analysis, paired-samples t-test (and Wilcoxon signed-rank test as necessary), independent samples t-test (and Mann-Whitney U test as necessary), one-way analysis of covariance, one-way repeated measures analysis of variance (and Friedman test as necessary), and one-way analysis of variance (and Kruskal-Wallis rank-sum test as necessary). As such, DBERlibR will contribute to the advancement of DBER by facilitating the creation and proliferation of evidence-based knowledge and practices.

Note: A list of references cited in the proposal will be provided upon request.


Thomas Helikar, University of Nebraska-Lincoln, Nebraska; Wendy Smith, University of Nebraska-Lincoln, Nebraska; Resa Helikar, University of Nebraska-Lincoln, Nebraska