I think the problem with this line of thinking is that it's very difficult to even after the fact accurately capture whether a project succeeded without a large dose of human judgement.

"We will use method X to improve process Y (that currently uses Z)."

Actual outcome: a paper showing that in certain conditions, X produces the same results as Z, but is 7% cheaper.

Is this a success? Who knows? Even absent fraud, you need human judgement to know whether (1) the conditions are relevant ones in practice and (2) is 7% a big deal or not (and maybe X has other types of cost)

Also, one could push the system towards projects where the outcomes are fuzzier to define, even though projects with a high probability of outright failure are also often very clearly worth doing. For example, a clinical trial with probability of success ~5% is often higher ROI than fuzzy exploratory research (and I'm a "fuzzy exploratory researcher" myself!)

Expand full comment

Great piece! I've been thinking about similar approaches to measuring reviewer performance and reputation -- forecasting is one good way! Editors of journals have to do similar soft forecasting when accepting papers (via projected impact factor), so as not to dilute their journal's reputation.

One challenge that makes it even more difficult for proposal reviewers is that there isn't necessarily a clear sense of how to define success (especially for basic exploratory research). For example, if a project doesn't deliver on an initial milestone but does open up the investigator to a new line of inquiry, should that be considered a success?

Expand full comment

That would be an improvement; peer review seems to be one of these systems that has followed a winding path to a place that no one would choose if starting from scratch, and yet no one can alone break its hold: https://jakeseliger.com/2020/05/24/a-simple-solution-to-peer-review-problems/

Expand full comment