Recent Metascience News/Links
There are a number of news articles, scholarly articles, policy changes, and even a documentary film, that caught my eye recently:
In Asterisk Magazine, Karthik Tadepalli makes the case that “Ideas Aren’t Getting Harder to Find.” It’s based on recent economic literature (such as this Census Bureau paper) suggesting that even though we spend far more on science and R&D than we did decades ago, the bottleneck to increased productivity and economic growth is less about the difficulty of discovering new scientific ideas, and more about our downstream capacity to bring them to market. Not sure if this is good news or bad news (given the difficulty in reforming some of those downstream processes).
Science magazine obtained an internal NSF memo announcing that NSF is lowering its internal requirements as to peer review (or, as they call it, merit review). In brief: “The changes permit as few as one outside review rather than the current minimum of three, end the routine use of expert panels to discuss those individual reviews, and give program managers greater authority to recommend which proposals should or should not be funded.”
A number of folks seem upset at the prospect, but I’m cautiously optimistic (the devil, as always, is in the details). The thing is, despite its widespread use, there really isn’t any empirical evidence for peer review for scientific grantmaking! A RAND report found as much in 2018 (and things haven’t changed since then):
“Judging whether peer review is demonstrably better than any other system is impossible because of the lack of comparators. No funding agencies have made significant use of other allocation systems. Even comparisons between or research on peer review systems is limited, with most studies examining the peer review process of one particular funder in one particular context, and few go beyond process measures to judge improvement.”
DARPA functions at a high level without rigid peer review requirements for every specific proposal. In principle, NSF program officers ought to be able to do likewise, i.e., to make informed decisions based on their own expertise. It would be preferable, of course, if NSF could roll out such a policy on a randomly-staggered basis so as to measure the effects . . .
A report came out this week on reproducibility problems both in the crystallography literature and some major databases, with some calling for a Google paper to be retracted. Not my field, but interesting to see all the drama.
I enjoyed this piece by Adam Marblestone, Anastasia Gamick, and Joseph Fridman on lessons learned from the first 5 years of Focused Research Organizations. They usefully clarify the places where FROs are useful, and where they are not.
Just came across an NBER working paper from July 2025, by Amitabh Chandra and Connie Xu. It’s called “Where Discovery Happens: Research Institutions and Fundamental Knowledge in the Life Sciences.” Their sample includes 560,000 articles in the life sciences written by 37,809 scientists who switched institutions during the time period at issue (this is important, because it helps to control for the underlying productivity, reputation, connections, etc. that any given researcher might have). I’m not going to get into the weeds of exactly how they define “productivity” (it includes number of papers, authorship position, weighting for the journal’s impact factor and the paper’s own citation count, and more).
Bottom line: “Between 50-60% of a scientist’s research output is attributable to the institution where they work, and two-thirds of this effect is driven by the presence of star researchers.”
A policy implication: “making public or philanthropic funding less generous at institutions that have high per-scientist output will directly reduce the production of knowledge, especially commercially relevant knowledge.”
Going back to at least the late 1990s, the NIH had a policy (here’s the 1998 version) requiring that if you ask for more than $500,000 per year in direct costs, you have to get advance permission from NIH staff that they will agree even to look at the application. The rationale at the time was that such large awards “are difficult to manage” and plan for within the usual budgetary process, so NIH wanted a heads up — and the right to refuse to accept such proposals at all. As of Dec. 3, 2025, that decades-old policy has been rescinded. No more need for advance permission.
Several years ago, an NIH leader speculated to me that since the $500,000 limit hadn’t been adjusted for inflation, it could have the unintended consequence of incentivizing researchers to propose smaller and smaller studies over time, which would be particularly negative as to clinical trials (which can be expensive to run properly).
It is odd to me that any government agency has policies with specific dollar amounts, and doesn’t adjust for inflation over a 27-year time period. $500,000 in 1998 would be worth $999,000 today. Seems like the policy was long overdue for a serious amendment. And ditching the policy seems sensible—it was a lot of red tape that doesn’t really seem necessary.
This is going to be a little more geeky than usual (!), but a long-standing pet peeve of mine is seeing journalists and even scientists report the “variance explained” by some variable as if it represents the possible causal impact of that variable. E.g., someone will say, “school spending explains only 5% of the variance in student outcomes,” as if that shows that school spending doesn’t matter (or even as if we could spend zero and get similar outcomes).
That is a completely wrong view of “explained variance.” Think of it this way: Imagine that no one in the world has a birth defect that results in anything other than two legs, and all of the variance in “number of legs” is explained by “accidents and disease.” That does not mean that genes have no causal impact as to “number of legs”—it is only because of genes that we all (or mostly all) have two legs in the first place. The causal importance simply isn’t the same thing as “variance explained,” which crucially depends on the underlying amount of variance at issue.
A couple of economists have a recent working paper called, “The Explanatory Power of Causal Effects.” I don’t know these economists myself, but they seem to have gotten comments from a bunch of folks that I do know and respect (Raj Chetty, Peter Hull, Kosuke Imai, Larry Katz, and Sendhil Mullainathan, among others). Their motivation is that while economists know that R-squared isn’t causal, “to our knowledge, there is no measure of the variation in an outcome causally explained by a variable.” They propose a new measure that they call Causal R-squared, or CR2. It requires both 1) an experiment that shows the effect of X on Y when randomly assigned, and 2) an observational dataset showing the relative variances of Y and X in the population. The point is to estimate how much an intervention might causally affect the variance of Y conditional on X.
They apply this new measure to several specific questions. For example, they “assess the share of variation in blood pressure explained by sodium intake: sodium causally explains 7% of the variation in men’s blood pressure, but less than 1% in women, despite similar causal effects. The gender difference arises because women’s blood pressure varies more for reasons unrelated to sodium.”
They also point out who will be interested in this and why: “Normatively, a policy‐maker may care about X, even though it explains little variation in Y. Goldberger (1979), writing in the context of genetic heritability, gives the example of eyesight: population variance in eyesight is largely genetic, but there is still great value in prescribing glasses. For this reason, CR2 is more relevant to a scientist seeking to understand the sources of naturally‐occurring variation in Y, than to a policy‐maker seeking to affect the value of Y.”
For more, see this Twitter thread by one of the authors.
On my watchlist: “The Thinking Game,” a full-length documentary on DeepMind. Looks amazing.



The frontiers of biostatistics is filled with a variety of "variance explained" equivalent measures but based on interventional prediction rather than associational/observational prediction (per Pearl's ladder of different "predictions"). They haven't been written up for a mainstream audience. Former RAND statistics group members have done work on this!
I am also cautiously optimistic about the changes in how peer review will be deployed, both at NSF and NIH. Peer review at the agencies was not conceived or designed to make fine-grained distinctions between a set of meritorious proposals. It was meant as a bulwark against political interference into free inquiry (seems relevant), and to advise (not dictate) the decisions of agency employees who were, to quote Vannevar, "persons of broad interest in and understanding of the peculiarities of scientific research and education". At NIH, there are reasons to believe that moving away from strict adherence to pay-by-score-order, which only developed and calcified in recent decades, will have positive effects on support for innovative proposals, emerging areas of research, and early stage investigators. It's something I will be watching closely in the coming months, and I'm sure you will be too.