The National Science Foundation sponsors a ton of STEM programs across the US, but far too rarely does it include a randomized evaluation of whether anything actually works. This is unfortunate, because the fact is that around 90% of education programs don’t work.
A recent congressional bill offers a chance to improve matters, if policymakers are willing.
On March 24, 2023, Representatives Chrissy Houlahan (D-PA) and Jim Baird (R-IN) reintroduced the Mathematical and Statistical Modeling Education Act, which “would work to modernize math curriculum and improve K-12 science, technology, engineering, and mathematics (STEM) education in the United States and help schools update their math curriculum to make it more relevant and applicable to real-world scenarios.”
The act itself asks NSF to fund grants “to advance innovative approaches to support and sustain high-quality mathematical modeling education in schools operated by local education agencies, including statistical problem solving, data science, operations research, and computational thinking.”
Examples include:
“engaging prekindergarten through grade 12 educators in professional learning opportunities to enhance mathematical modeling and statistical problem solving knowledge”
“conducting research on curricula and teaching practices that empower students to choose the mathematical, statistical, computational, and technological tools that they will apply to a problem.”
Nice ideas! But how do we know any of this will work?
As to individual grants, the Act only says this: “All proposals for grants under this section shall include an evaluation plan that includes the use of outcome oriented measures to assess the impact and efficacy of the grant. Each recipient of a grant under this section shall include results from these evaluative activities in annual and final projects.”
And as to the overall portfolio, the Act only says that the NSF Director shall “use a common set of benchmarks and tools to assess the results of research conducted under such grants and identify best practices.”
This is far from adequate.
Randomized controlled trials (RCTs) are the best way both to:
Validate which math education programs are actually worth supporting in the first place; and
Test which new math education programs might actually work or not.
Yet the Act doesn’t mention RCTs, either in selecting which math education programs to fund or in evaluating how well they work in the future.
This is far from optimal. We might waste millions of dollars each year on programs without knowing whether they work.
This bill should reaffirm longstanding NSF principles about using RCTs for education research. As NSF stated in 2013:
The research plan should identify and justify (1) the study design used to estimate causal impact of the intervention on the outcomes of interest; (2) the key outcomes of interest for the impact study and the minimum size impact of the intervention that would have policy or practical relevance; (3) the study setting(s) and target population(s); (4) the sample, including the power it provides for detecting an impact; (5) the data collection plan, including information about procedures and measures, including evidence on and strategies for ensuring reliability and validity, and plans for collecting data on program implementation, comparison group practices, and study context; and (6) the analysis and reporting plan.
Efficacy, Effectiveness, and Scale-up research should use study designs that will yield impact estimates with strong causal validity and that, for example, could meet What Works Clearinghouse standards without reservations (http://ies.ed.gov/ncee/wwc/ ). Generally and when feasible, they should use designs in which the treatment and comparison groups are randomly assigned.
All of those principles should obviously still apply to education research. But they should also apply to funding education programs too—that is, if NSF funds a math education program in K-12 schools, it should either 1) have strong evidence from RCTs that the program already works, or else 2) include one or more rigorous RCTs to evaluate the program going forward.
To be sure, there are many occasions where an RCT isn't the ideal method -- for example, we can't randomize which state adopts a higher minimum wage.
But a student-level math program is the perfect case for an RCT. There is no excuse not to use RCTs both to pick which math programs to fund, and to evaluate programs going forward.
I disagree. This debate has been going ever since Hargeaves' speech some quarter century ago. IMHO RCTs in education are much less relevant than in e.g. medicine because the intervention usually doesn't target the individual but a social group of learners. So a study that does an intervention in two classes of students, where one was the control, is best thought of as having N=1 in both.
Hence, doing a sufficiently powered RCT is incredibly difficult and likely has to draw its sample from beyond the local context where the investigated curriculum is even relevant.
To be clear, I'm not saying we should throw out RCTs in education. But I certainly do not see them as the same kind of golden tool that they are in other disciplines.