Citations

Most of us know about FRED, the Federal Reserve Economic Data hosted by the Federal Reserve of St. Louis. It provides data and graphs at your fingertips. You can quickly grab a graph for a report or for a online argument. Of course, you can learn from it too. I’ve talked in the past about the Excel and Stata plugins.

But you may not know about the FRED FRASER. From their about page, “FRASER is a digital library of U.S. economic, financial, and banking history—particularly the history of the Federal Reserve System”. It’s a treasure trove of documents. Just as with any library, you’re not meant to read it all. But you can read some of it.

I can’t tell you how many times I’ve read a news story and lamented the lack of citations – linked or unlinked. Some journalists seem to do a google search or reddit dive and then summarize their journey. That’s sometimes helpful, but it often provides only surface level content and includes errors – much like AI. The better journalists at least talk to an expert. That is better, but authorities often repeat 2^nd hand false claims too. Or, because no one has read the source material, they couch their language in unfalsifiable imprecision that merely implies a false claim.

A topical example would be the oft repeated blanket Trump-tariffs. That part is not up for dispute. Trump has been very clear about his desire for more and broader tariffs. Rather, economic news often refers back to the Smoot-Hawley tariffs of 1930 as an example of tariffs running amuck. While it is true that the 1930 tariffs applied to many items, they weren’t exactly a historical version of what Trump is currently proposing (though those details tend to change).

How do I know? Well, I looked. If you visit FRASER and search for “Smoot-Hawley”, then the tariff of 1930 is the first search result. It’s a congressional document, so it’s not an exciting read. But, you can see with your own eyes the diversity of duties that were placed on various imported goods. Since we often use the example of imported steel and since the foreign acquisition of US Steel was denied, let’s look at metals on page 20 of the 1930 act. But before we do, notice that we can link to particular pages of legislation and reports – nice! Reading the Smoot-Hawley Tariff Act’s original language, we can see the diverse duties on various metals. Here are a few:

Continue reading →

EDIT: See my new published paper on this topic “ChatGPT Hallucinates Non-existent Citations: Evidence from Economics“

This blog post is co-authored with graduate student Will Hickman.

EDIT: Will and I now have a paper on trusting ChatGPT “Do People Trust Humans More Than ChatGPT?“

Although many academic researchers don’t enjoy writing literature reviews and would like to have an AI system do the heavy lifting for them, we have found a glaring issue with using ChatGPT in this role. ChatGPT will cite papers that don’t exist. This isn’t an isolated phenomenon – we’ve asked ChatGPT different research questions, and it continually provides false and misleading references. To make matters worse, it will often provide correct references to papers that do exist and mix these in with incorrect references and references to nonexistent papers. In short, beware when using ChatGPT for research.

Below, we’ve shown some examples of the issues we’ve seen with ChatGPT. In the first example, we asked ChatGPT to explain the research in experimental economics on how to elicit attitudes towards risk. While the response itself sounds like a decent answer to our question, the references are nonsense. Kahneman, Knetsch, and Thaler (1990) is not about eliciting risk. “Risk Aversion in the Small and in the Large” was written by John Pratt and was published in 1964. “An Experimental Investigation of Competitive Market Behavior” presumably refers to Vernon Smith’s “An Experimental Study of Competitive Market Behavior”, which had nothing to do with eliciting attitudes towards risk and was not written by Charlie Plott. The reference to Busemeyer and Townsend (1993) appears to be relevant.

Although ChatGPT often cites non-existent and/or irrelevant work, it sometimes gets everything correct. For instance, as shown below, when we asked it to summarize the research in behavioral economics, it gave correct citations for Kahneman and Tversky’s “Prospect Theory” and Thaler and Sunstein’s “Nudge.” ChatGPT doesn’t always just make stuff up. The question is, when does it give good answers and when does it give garbage answers?

Strangely, when confronted, ChatGPT will admit that it cites non-existent papers but will not give a clear answer as to why it cites non-existent papers. Also, as shown below, it will admit that it previously cited non-existent papers, promise to cite real papers, and then cite more non-existent papers.

We show the results from asking ChatGPT to summarize the research in experimental economics on the relationship between asset perishability and the occurrence of price bubbles. Although the answer it gives sounds coherent, a closer inspection reveals that the conclusions ChatGPT reaches do not align with theoretical predictions. More to our point, neither of the “papers” cited actually exist.

Immediately after getting this nonsensical answer, we told ChatGPT that neither of the papers it cited exist and asked why it didn’t limit itself to discussing papers that exist. As shown below, it apologized, promised to provide a new summary of the research on asset perishability and price bubbles that only used existing papers, then proceeded to cite two more non-existent papers.

Tyler has called these errors “hallucinations” of ChatGPT. It might be whimsical in a more artistic pursuit, but we find this form of error concerning. Although there will always be room for improving language models, one thing is very clear: researchers be careful. This is something to keep in mind, also, when serving as a referee or grading student work.