The actual AI problem in academic economics

There is a steady flow of takes on the impact of AI on academic economics research, whether its the example of someone writing an ostensibly legitimate, if somewhat trite, research paper with only a few hours effort to the implication that there is already no need to continue writing papers as the AIs are already better at at. Oh, what shall all the candlemakers do now that the sun has risen?

I think the idea that AI has already rendered the research paper an obselete endeavor is very wrong, almost to the point of negligence. It both vastly underestimates the quality of the median contribution provided in the 80 to 100 or so best journals and vastly overestimates the reliablity of current AI attempts at research on the margin. Putting such concerns aside for the moment, it’s still worth pondering how we can extrapolate from current AI as a tool for status quo research to forecast if it might reshape labor as an input 5 or 10 years from now. That’s far enough away that it borders on futurism and, more importantly, the kind of forecasting that I shy away from. Feel free to tell me in the comments where we are headed.

At this moment, however, we already are in the middle of a far more subtle disruption in academic research that I haven’t seen anyone write about yet. The quiet, but pronounced, uptake of AI tools in the writing of referee reports for academic journals. If you’ve submitted papers for review in the last 18 months, dollars to donuts you’ve received a referee report that has been lengthy, well-organized, with an unusual number of bullet points and headers discussing your paper, summarizing it’s contributions, and offering suggestions that on their face seem reasonable but upon a moment’s reflection are quickly realized to be entirely vapid by someone familiar with the structure of the data and relevant literature.

There is something uniquely frustrating about working on a research project for 3 to 5 years only to have judgement passed down on the basis, at least in part, of a review written by ChatGPT that is not just wrong but, well, kind of stupid. I’ve already personally had to deal with having a paper refereed via ChatGPT, rejected, and then, thanks to it being internalized by ChatGPT into their text base, it being reconfigured into a citaton hallucination cited by other papers that, to maxmize comedy, replaced 3 (including me) of the 4 authors with other (nicer? better looking?) economists. What’s most frustrating, however, is that this is hitting economics journals that do not seem to have any plan in place to deal with it. Not to suggest this is an easy problem to solve (not remotely), but it certainly should not be coming as a surprise to anyone. Let’s look at the facts:

Academic economists are almost universally overcommitted.
Journal referees are, for the most part, unpaid for their time.
As the number of quality articles produced and submitted to journals has increased, so has the strain on the entire editorial process, including review writing.
The only thing holding it together at all has been reputational incentives (i.e. nobody wants a bad reputation with the editors that are going to consider your future work) and a disciplinary sense of “civic duty”. Reputation is, of course, the load bearing mechanism here.
A technology was introduced that, at the very least, pantomimes the review process well enough that it can produce a low quality fascimile of a review that, with a few sentences tossed in at the beginning and a short separate letter written directly the editor by the reviewer, can allow a task that used to take a 0.5 to 1.5 work days can now be crossed off your to-do list in less than an hour.

Is it really that hard to see what’s coming? Of course academic economists are going to be tempted to ask ChatGPT to write a review for them. There are almost no direct rewards for writing good reviews, while the costs are significant. Evaluating a genuinely new and distinct piece of research that has never been done before is hard work and takes significant time.

Now, how this is playing out across the body of journals is an open question. Here’s my best educated guess:

At the top journals, reputational concerns are the strongest, but so is the opportunity cost of everyone’s time and the competition for limited article space. Referees might not have the courage to outsource the actual decision to ChatGPT, but they’ll be awfully tempted to offload as much of the grunt work as they can. If I were an editor at a top 10-15 journal, I would expect a growing number of reports from referees who read the paper quickly (<15 minutes), then made a decision to recommend acceptance or rejection based on 1) if they knew any of the authors, 2) whether the content is a complement or a substitute for their own research, 3) whether they had seen the paper presented in person and was well-received, 4) the general bundle of status associated with the authors and the subject, and 5) whether they liked the paper (you can, in fact, have a strong opinion on paper you’ve looked at for 15 minutes. We’re all guitly of it). Having arrived at their positive or negative assessment, they then outsource the actual first draft of the the review to ChatGPT, with the instruction to write a positive or negative review. Now, given the strong reputational considerations that any credible reviewer at a top journal should have, I expect there to then be significant rewriting of the review, including that addition of the reviewer’s preferried economist gripes about identification, whether the results generalize, etc, giving an otherwise generic report some more bespoke vibes. This isn’t the real recommendation anyway, that’s the letter to the editor that goes unseen by the research authors. I don’t think most referees will have the brass to outsource that.

That’s probably not great, especially for young authors trying to break into a field. But honestly, none of those problems are new. If anything it takes a very old problem (i.e. overcommitted economist at top school asks his or her student to write a referee report rejecting an article for them) and just tweaks it slightly (i.e. overcommitted economist at top school asks ChatGPT to write a referee report rejecting an article for them, freeing up a PhD student to get back to work cleaning and analyzing their data for them). Not optimal, but hey, what is?

The real problem, I am sad to say, is the next tier down. The field journals. The second-tier general journals. The oddball and heterodox journals. The journals that used to struggle to get enough good submissions and now struggle to find anyone to referee for them. What used to be a trickle is now a deluge of higher quality research. That deluge, however, comes from authors who also constitute a referee pool that is far busier than they were before and without the same resources that come with appointment as top institutions.

I promise you, from experience, keeping a significant research agenda going during my salad days when I was teaching a 3-3 load was not easy. What happens when the 71st ranked journal that you might submit an article to one day sends a seemingly acceptable, if mediocre and slightly banal article to review? Are you really going to give it a precious work day? Or are you going to give it a once over, ask chatGPT to review it, and then give a recommendation based on a 5 minute skim? I want believe that I would never associated my professional reputation with a half-assed review, but that’s easier to say on this side of the R1 tenure fence.

Now’s the part where I smugly tell you the obvious solution and call it a night. As is often the case, however, I don’t have one. Not one that anyone is going to like, at least. Because, the only solution I have is precisely the suggestion that got Jerry Maguire fired. We could simply publish and write fewer papers. If we write fewer papers, we can review fewer papers. If we review fewer papers we can pay people to review them. If we can pay people to review them, we can hold them to higher quality standards. Editors can review the reviews. Every now and then someone suggests we get rid of anonymous reviewers, but I worry that anonymity is load bearing when it comes to the quality standards that are in many ways the hallmark of modern economics. I don’t think we can give up on quality. Quality is our comparative advantage. So maybe its time we let go of quantity. If your dean says you’ve written some good and important article, but there aren’t enough lines on your vitae, then what they’re really saying is that they don’t want research faculty, they want AI middlemen.

Don’t be an AI middleman.

James Gibson March 30, 2026 / 9:08 am

I’ve already personally had to deal with having a paper refereed via ChatGPT, rejected, and then, thanks to it being internalized by ChatGPT into their text base, it being reconfigured into a citaton hallucination cited by other papers that, to maxmize comedy, replaced 3 (including me) of the 4 authors with other (nicer? better looking?) economists.

Was this paper published or pre-published publicly? I’m not sure if you’re suggesting that the act of the referee pasting or uploading it into ChatGPT was itself the reason that it was ‘internalized’, or if it was posted on the web as well?

LikeLike

James Gibson March 30, 2026 / 9:08 am

I’ve already personally had to deal with having a paper refereed via ChatGPT, rejected, and then, thanks to it being internalized by ChatGPT into their text base, it being reconfigured into a citaton hallucination cited by other papers that, to maxmize comedy, replaced 3 (including me) of the 4 authors with other (nicer? better looking?) economists.

Was this paper published or pre-published publicly? I’m not sure if you’re suggesting that the act of the referee pasting or uploading it into ChatGPT was itself the reason that it was ‘internalized’, or if it was posted on the web as well?

LikeLike

- mdmakowsky March 30, 2026 / 10:46 am
  
  Our paper (the real one) was available as a pre-print online, but my guess is that the version from the journal reviewing it was uploaded to chatGPT by a referee, and thus it’s content entered into the ChatGPT LLM was branded, in part, by the journal it was being considered at.
  
  LikeLike
  
  - James Gibson March 30, 2026 / 10:49 am
    
    Interesting, very interesting!
    
    LikeLike
Pingback: Should Practicing Economists Read Tyler’s New Marginalism Book – Economist Writing Every Day
Tyler Ransom April 8, 2026 / 3:27 pm

Very nice post. I wrote something similar about the referee process, not sure if you saw it. https://tyleransom.substack.com/p/fixing-persistent-peer-review-problems

Your post is much better written and has some new perspectives I hadn’t considered. Thanks for taking the time to write it!

LikeLiked by 2 people