With the latest fraudulent AI-econ paper, a number of colleagues from MIT, Brown, Princeton, Univ. of New South Wales, etc., have noted that I called out the paper’s problems several months ago.
All of which is flattering . . . but also slightly awkward. Nothing I did was all that special!
What I did was:
1) Read the entire paper including appendices (I don't always do this, by any means!);
2) Ask some questions, such as
Do these results seem plausible or even possible?
Does a first-year grad student at MIT have time to manually classify 3,000 sets of lab notes in several different highly-technical and unfamiliar fields?
Has even Raj Chetty ever written a paper with this much fieldwork, data from many different sources, and coding, in one year with no co-authors or research assistants?
and perhaps most importantly:
3) Live by the social norm "trust but verify, and speak your mind," rather than the social norm that seems to be more common in academia: "trust everyone blindly, and don't publicly indicate any possible distrust until other people have spoken out first, leading to a case that has now been definitively proven."
Apparently there *was* a significant whisper network of people who had doubts about the AI paper (just as has been the case with fraud in other fields).
But I heard from other people last year that it was a breach of etiquette for me even to ask questions about the AI paper, because the undertone or implication might be that the paper could conceivably, possibly, maybe be fraudulent, and that was the worst possible thing you could ever even remotely hint at.
I disagree with that social norm. It should be completely normal to ask, "so how did you actually get this amazing data anyway?" In fact, we should ask that in EVERY study involving private companies (as well as public sources like IRS that are stingy with data), so that there's no implication that "you, in particular, are suspicious."
By analogy: when you get a mortgage to buy a house, the bank typically wants documentation about where the down payment came from. They're not accusing *you* of malfeasance per se, but the source of the down payment (e.g., your own savings versus a gift from parents) affects what the bank thinks you can actually repay. If $100,000 showed up in your account last month with no plausible explanation, the bank will want to know more.
That said, I don't want any new bureaucratic requirements for research akin to IRBs. Absolutely not. That would only slow everything down, and probably wouldn't deter actual fraudsters very much anyway.
But cultural change would be nice. It should be much more normalized and routine to ask probing questions about the provenance of data, about how scholars had time to do a particularly demanding project, and so forth. Particularly in the age of AI tools, it should not be the norm to show up with unbelievably amazing data from out of the blue, while everyone else is afraid even to gently ask where it came from.
What you did should indeed be basic protocol for anyone who sees results like this -- something needs to change in econ culture. Thank you for the service you've done for our discipline.
Rather than relying on individual's to make accusations of fraud, it would seem better to strengthen the norm that data must be made public. If there are good reasons for confidentiality, at least provide suitably restricted access to trusted third parties.
Worth observing that outright fraud like this remains exceptionally rare as far as can be determined. Still much bigger problems with bad statistical practice like p-hacking. Data repositories and pre-registration help here also.