Counting Hallucinations by Web-Enabled LLMs

In 2023, we gathered the data for what became “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics.” Since then, LLM use has increased. A 2025 survey from Elon University estimates that half of Americans now use LLMs. In the Spring of 2025, we used the same prompts, based on the JEL categories, to obtain a comprehensive set of responses from LLMs about topics in economics.

Our new report on the state of citations is available at SSRN: “LLM Hallucination of Citations in Economics Persists with Web-Enabled Models

What did we find? Would you expect the models to have improved since 2023? LLMs have gotten better and are passing ever more of what used to be considered difficult tests. (Remember the Turing Test? Anyone?) ChatGPT can pass the bar exam for new lawyers. And yet, if you ask ChatGPT to write a document in the capacity of a lawyer, it will keep making the mistake of hallucinating fake references. Hence, we keep seeing headlines like, “A Utah lawyer was punished for filing a brief with ‘fake precedent’ made up by artificial intelligence

What we call GPT-4o WS (Web Search) in the figure below was queried in April 2025. This “web-enabled” language model is enhanced with real-time internet access, allowing it to retrieve up-to-date information rather than relying solely on static training data. This means it can answer questions about current events, verify facts, and provide live data—something traditional models, which are limited to their last training cutoff, cannot do. While standard models generate responses based on patterns learned from past data, web-enabled models can supplement that with fresh, sourced content from the web, improving accuracy for time-sensitive or niche topics.

At least one third of the references provided by GPT-4o WS were not real! Performance has not significantly improved to the point where AI can write our papers with properly incorporated attribution of ideas. We also found that the web-enabled model would pull from lower quality sources like Investopedia even when we explicitly stated in the prompt, “include citations from published papers. Provide the citations in a separate list, with author, year in parentheses, and journal for each citation.” Even some of the sources that were not journal articles were cited incorrectly. We provide specific examples in our paper.

In closing, consider this quote from an interview with Jack Clark, co-founder of Anthropic:

The best they had was a 60 percent success rate. If I have my baby, and I give her a robot butler that has a 60 percent accuracy rate at holding things, including the baby, I’m not buying the butler.

Papers about Economists Using LLMs

  1. The most recent (published in 2025) is this piece about doing data analytics that would have been too difficult or costly before. Link and title: Deep Learning for Economists

Considering how much of frontier economics revolves around getting new data, this could be important. On the other hand, people have been doing computer-aided data mining for a while. So it’s more of a progression than a revolution, in my expectation.

2. Using LLMs to actually generate original data and/or test hypotheses like experimenters: Large language models as economic agents: what can we learn from homo silicus? and Automated Social Science: Language Models as Scientist and Subjects

3. Generative AI for Economic Research: Use Cases and Implications for Economists

Korinek has a new supplemental update as current as December 2024: LLMs Learn to Collaborate and Reason: December 2024 Update to “Generative AI for Economic Research: Use Cases and Implications for Economists,” Published in the Journal of Economic Literature 61 (4)

4. For being comprehensive and early: How to Learn and Teach Economics with Large Language Models, Including GPT

5. For giving people proof of a phenomenon that many people had noticed and wanted to discuss: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Alert: We will soon have an update for current web-enabled models! It would seem that hallucination rates are going down but the problem is not going away.

6. This was published back in 2023. “ChatGPT ranked in the 91st percentile for Microeconomics and the 99th percentile for Macroeconomics when compared to students who take the TUCE exam at the end of their principles course.” (note the “compared to”): ChatGPT has Aced the Test of Understanding in College Economics: Now What?

References          

Buchanan, J., Hill, S., & Shapoval, O. (2023). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist69(1), 80-87. https://doi.org/10.1177/05694345231218454 (Original work published 2024)

Cowen, Tyler and Tabarrok, Alexander T., How to Learn and Teach Economics with Large Language Models, Including GPT (March 17, 2023). GMU Working Paper in Economics No. 23-18, Available at SSRN: https://ssrn.com/abstract=4391863 or http://dx.doi.org/10.2139/ssrn.4391863

Dell, M. (2025). Deep Learning for Economists. Journal of Economic Literature, 63(1), 5–58. https://doi.org/10.1257/jel.20241733

Geerling, W., Mateer, G. D., Wooten, J., & Damodaran, N. (2023). ChatGPT has Aced the Test of Understanding in College Economics: Now What? The American Economist68(2), 233-245. https://doi.org/10.1177/05694345231169654 (Original work published 2023)

Horton, J. J. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv Preprint arXiv:2301.07543.

Korinek, A. (2023). Generative AI for Economic Research: Use Cases and Implications for Economists. Journal of Economic Literature, 61(4), 1281–1317. https://doi.org/10.1257/jel.20231736

Manning, B. S., Zhu, K., & Horton, J. J. (2024). Automated Social Science: Language Models as Scientist and Subjects (Working Paper No. 32381). National Bureau of Economic Research. https://doi.org/10.3386/w32381

Queens 2060: Where Upzoning Matters Most

Most US cities make it hard for housing supply to meet demand because of rules that prevent large apartment buildings. Usually cities do this with zoning rules that limit the number of homes per parcel, often to as low as 1. New York City relies more on rules about Floor Area Ratio (the ratio of the floor area to the area of the parcel). But how binding are these rules? If we relaxed or repealed them, how much new construction would we see, and where would we see it?

MIT PhD student Vincent Rollet has calculated this for New York City:

I build a dynamic general equilibrium model of the supply and demand of floorspace in a city , which I estimate using a novel parcel-level panel dataset of land use and zoning in New York City. I validate the model using quasi-experimental variation from recent zoning reforms and use it to simulate the effects of zoning changes on construction and prices.

He finds that eliminating these rules in NYC would lead to a construction boom, with a 79% increase in the amount of floor space available by 2060. This would allow many more people to live in New York, with a 52% increase in population; but many of the benefits would go to existing NYC residents, with more floor space per person and modestly lower rents leading to higher wellbeing:

Where exactly would we see the building boom? Not Manhattan, but Brooklyn and Queens. The intuition is that zoning is most binding in places where housing prices are currently high but where the buildings are currently small; this is where there is the biggest incentive to tear down existing buildings and build taller if you are allowed to.

The Most Regulated States

The Mercatus Center has put together a page of “Snapshots of State Regulation” using data from their State RegData project. Their latest data suggests that population is still a big predictor of state-level regulation, on top of the red/blue dynamics people expect:

They also made pages with much more detail on each state, like what the most regulated industries in each state are and how each one compares to the national average:

You can find your state here.

“How Can the US Manufacture More” Is a Reasonable Question That Deserves Reasonable Answers

Many regular Americans and policymakers say they want the US to manufacture more things domestically. But when they ask economists how to accomplish this, I find that our most common response is to question their premise- to say the US already manufactures plenty, or that there is nothing special about manufacturing. It’s easy for people to round off this answer to ‘your question is dumb and you are dumb’, then go ask someone else who will give them a real answer, even if that real answer is wrong.

Economists tell our students in intro classes that we focus on positive economics, not normative- that we won’t tell you what your goals should be, just how best to accomplish them. But then we seem to forget all that when it comes to manufacturing. Normally we would take even unreasonable questions seriously; but I think wondering how to increase manufacturing output is reasonable given the national defense externalities.

So if you had to increase the value of total US manufacturing output- if you were going to be paid based on a fraction of real US manufacturing output 10 years from now- how would you do it?

I haven’t made a deep study of this, but here are my thoughts. Better ideas at the top, ‘costly but would increase manufacturing output’ ideas at the bottom:

Continue reading

Research on Big Questions April 2025

I’m working on a new paper with Bart Wilson. We might have a draft to release soon.

  1. https://economistwritingeveryday.com/2023/03/25/discrepancy-in-views-about-music-pirating/  In that post, I pointed out that the estimates reported in journals for the effect of pirating on music revenues range from almost 0% to almost 100%. There is room for new empirical work. Not often is the range of the estimates that big.
  2. My coauthor Bart Wilson did an interesting podcast episode for the Curious Task in 2020.

https://thecurioustask.podbean.com/e/ep-64-bart-wilson-%e2%80%94-is-the-idea-of-property-universal/

Episode: Bart Wilson — Is The Idea of Property Universal? 

I’m providing a rough transcription of the part that stood out to me, because he identified a prime big unanswered question. This is around minute 7 of the episode.

Host: Why is [the Property Species] an interesting topic deserving of a book?

Bart Wilson: “So, I work with primatologists… and I would talk to them about what I’m working on with my laboratory experiments on property. They would say, ‘Oh yeah. Dolphins do that, too, or baboons. … scrub jays re-cache their food if another scrub jay is watching them so they are protecting themselves against theft… so property is all over the animal kingdom. And then I’m also working with my colleague in the English department. In the humanities, property is a very narrow thing, something Western European. It’s very modern. And, so, in one part of the academy property is this broadly natural phenomenon and in another part of the academy it’s very local: only some humans have it. And so, as a social scientist…”

Bart identified a gap in understanding. Property cannot be both common to all animals and rare among humans. In his book The Property Species he spans that gap by claiming (spoiler alert) that property is common to all humans and only humans. Human language is an important piece of that story. No other animal can wield complex symbolic language.

In our new paper (manuscript forthcoming) we’ll be investigating how humans use symbolic language to describe nonrivalrous digital resources.

Trump’s National Sales Tax

Tariffs are going up to levels last seen in the 1930 Smoot-Hawley tariffs that helped kick off the Great Depression:

Tariffs are taxes- roughly, a national sales tax with an exemption for domestically-produced goods and services. I think the words make a difference here- “raising tariffs on countries who we run a trade deficit with” just sounds abstruse to most people, while “raising taxes on goods bought from firms in net-seller countries” sounds negative, but they are the same thing.

Of course, in this case the plan is to raise taxes to at least 10% on goods from all other countries even if they aren’t net-sellers, and raise taxes up to 49% on those that are. This is not a negotiating tactic. We know this from the math- the new tax formula uses net imports from a country rather than a country’s tariff rates, so a country could cut their tariffs on US goods to zero today and it wouldn’t necessarily reduce our “reciprocal” tariffs at all; at best it would reduce them to 10%. We also know it isn’t about negotiating because the administration says it isn’t. Their goal, obviously, is to reduce trade, not to free it.

They say they are doing this to bring manufacturing back to America and to promote national defense. But American manufacturers don’t seem happy. Even before the latest huge tax increase, trade war was their biggest concern:

The National Association of Manufacturers Q1 2025 Manufacturers’ Outlook Survey reveals growing concerns over trade uncertainties and increased raw material costs. Trade uncertainties surged to the top of manufacturers’ challenges, cited by 76.2% of respondents, jumping 20 percentage points from Q4 2024 and 40 percentage points from Q3 of last year.

The National Association of Manufacturers responded to the latest tax increase with a negative statement; so even the one major group that might have benefitted from tariffs is unhappy. Foreign producers and US consumers will of course be very unhappy. I think Trump is making a huge political blunder alongside the economic one- he got elected largely because Biden allowed inflation to get noticeably high, but now Trump is about to do the same thing.

I also see this as a huge national security blunder. For tariffs on China, I at least see their argument- we should take an economic hit today in order to become less reliant on our peer-competitor and potential adversary. But the tariffs on allies make no sense- they are hitting the very countries that are most valuable as economic and/or military partners in a conflict with China, like Canada, Mexico, Japan, South Korea, Vietnam, India, and Taiwan (!!!). One of our biggest advantages vs. China has been that we have many allies and they have few, and we appear to be throwing away this advantage for nothing.

What can you or I do about this? Stock up on durable goods before the price increases hit. Picking investment winners is always hard, but things this makes me consider are gold, stocks in foreign countries that trade little with the US, and companies whose stocks took a big hit today despite not actually being importers. Finally, we can try nudging Congress to do something. The Constitution gives the power to levy taxes to the legislative branch, but in the 20th century they voted to delegate some of this power to the executive. Any time they want, Congress could repeal these tariffs and take back the power to set rates. I have some hope they actually will- just yesterday the Senate voted to repeal some tariffs on Canada, and more votes are planned. The alternative is to risk a recession and a wipeout in the midterms:

Hospitals Remain Full Even as Covid Subsides

The average hospital is now 3/4 full- more full than during much of the worst of the Covid pandemic, and well above the 2/3 occupancy rate that prevailed during the 2010s. This is according to a study out yesterday in JAMA Open:

This seems to be due to a reduction in bed supply, rather than an increase in demand:

The number of staffed hospital beds declined from a prepandemic steady state of 802 000 (2009-2019 mean) to a post-PHE steady state of 674 000, whereas the mean daily census steady state remained at approximately 510 000

To me this is one more reason to reform Certificate of Need laws that put barriers in the way of hospitals opening or adding beds. Luckily I see a lot of momentum for CON reform this legislative season, including the highest-occupancy state, Rhode Island:

The Big Ideas

Do I really think that the things I write about here and in my papers are the most important things in the world? No. Like most academics, I tend to emphasize the issues where I think I bring a unique perspective, rather than most important issues. But if you don’t realize this, you might get the impression that I think the things I normally talk about are the most important, rather than simply the most neglected and tractable / publishable. I don’t work on the most important issues because I see no good way for me to attack them- but if you do see a way, that is where you should focus. So what are the big issues of the 2020’s?

I see two issues that stand out above the many other important events of the day:

  • Artificial Intelligence: At minimum, the most important new technology in a generation; has the potential to bring about either utopia or dystopia. Do you have ideas for how to nudge it one way or another?
  • Rise of China: From extreme poverty to the world’s manufacturing powerhouse in two generations. What lessons should other countries learn from this for their own economic policy? How can we head off a world war and/or Chinese hegemony?

Focusing a bit more on economics, I see two perennial issues where there could be new opportunities to solve vital old questions:

  • Economic Development: We still don’t have a definitive answer to Adam Smith’s founding question of economics- why are some countries rich while other countries are poor, and how can the poor countries become rich? I think economic freedom is still an underrated answer, but even if you agree, the question remains of how to advance freedom in the face of entrenched interests who benefit from the status quo.
  • Robust Prediction: How can we make economics into something resembling a real science, one where predictions that include decimal places don’t deserve to be laughed at? Can you find a way to determine how much external validity an experiment has? Or how to use machine learning to get at causality? Or at least push existing empirical research to be more replicable?

I’ve added these points to my ideas page, since all this was inspired by me talking through the ideas on the page with my students and realizing how small and narrow they all seemed. Yes, small and narrow ideas are currently easier to publish in economics, but there is more to research and life than easy publications.

A Wartime Natural Experiment About Copyright

One of the hardest questions in copyright policy is: “What would have happened otherwise?” When Disney lobbies for longer copyright terms or academic publishers defend high subscription fees, we struggle to evaluate their claims because we can’t observe the counterfactual. What would happen to creativity and innovation if we shortened copyright terms or lowered prices?

This is what makes Biasi and Moser’s 2021 study in the American Economic Journal: Microeconomics valuable. They examine a rare “natural experiment” from World War II – the Book Republication Program (BRP) – which provides insights into how copyright affects the spread and use of knowledge.

In 1942, the U.S. government allowed American publishers to reprint German scientific books without seeking permission from German copyright holders (though royalties were still paid to the U.S. government). This created a test case: German books suddenly became cheaper, while similar Swiss scientific books (Switzerland being neutral in the war) maintained their original copyright protection and prices.

This setup lets us answer the counterfactual question. What happens when you maintain basic royalty payments but prevent monopoly pricing? The researchers compared the same book before and after the policy change, German books versus Swiss books, areas near libraries with these books versus those without, and usage by English-speaking scientists versus others. Such comprehensive comparison groups are rarely available in copyright research.

The authors report that when book prices fell by 10%, new research citing these books increased by 40%. The benefits spread beyond elite institutions, with new research clusters emerging wherever scientists gained access to these books. This does not appear to just be shifting citations from one source to another – there was genuine new knowledge creation, evidenced by increased patents and PhD production.

Such clean natural experiments in copyright policy are rare (there are a few laboratory experiments). Most changes come from lobbying (like the “Mickey Mouse Protection Act”) or technological disruption (like music streaming), making it hard to isolate the effects of copyright itself. The BRP provides uniquely clear evidence that moderate copyright protection – rather than maximum protection – might better serve innovation.

As we debate copyright terms and academic paywalls today, this historical accident of war gives us something valuable: empirical evidence about what happens when you find a middle ground between total copyright protection and unrestricted access.

Biasi, Barbara and Petra Moser. 2021. “Effects of Copyrights on Science: Evidence from the WWII Book Republication Program.” American Economic Journal: Microeconomics, 13 (4): 218–60.