This discovery and the examples provided are by graduate student Will Hickman.
Although many academic researchers don’t enjoy writing literature reviews and would like to have an AI system do the heavy lifting for them, we have found a glaring issue with using ChatGPT in this role. ChatGPT will cite papers that don’t exist. This isn’t an isolated phenomenon – we’ve asked ChatGPT different research questions, and it continually provides false and misleading references. To make matters worse, it will often provide correct references to papers that do exist and mix these in with incorrect references and references to nonexistent papers. In short, beware when using ChatGPT for research.
Below, we’ve shown some examples of the issues we’ve seen with ChatGPT. In the first example, we asked ChatGPT to explain the research in experimental economics on how to elicit attitudes towards risk. While the response itself sounds like a decent answer to our question, the references are nonsense. Kahneman, Knetsch, and Thaler (1990) is not about eliciting risk. “Risk Aversion in the Small and in the Large” was written by John Pratt and was published in 1964. “An Experimental Investigation of Competitive Market Behavior” presumably refers to Vernon Smith’s “An Experimental Study of Competitive Market Behavior”, which had nothing to do with eliciting attitudes towards risk and was not written by Charlie Plott. The reference to Busemeyer and Townsend (1993) appears to be relevant.
Although ChatGPT often cites non-existent and/or irrelevant work, it sometimes gets everything correct. For instance, as shown below, when we asked it to summarize the research in behavioral economics, it gave correct citations for Kahneman and Tversky’s “Prospect Theory” and Thaler and Sunstein’s “Nudge.” ChatGPT doesn’t always just make stuff up. The question is, when does it give good answers and when does it give garbage answers?
Strangely, when confronted, ChatGPT will admit that it cites non-existent papers but will not give a clear answer as to why it cites non-existent papers. Also, as shown below, it will admit that it previously cited non-existent papers, promise to cite real papers, and then cite more non-existent papers.
We show the results from asking ChatGPT to summarize the research in experimental economics on the relationship between asset perishability and the occurrence of price bubbles. Although the answer it gives sounds coherent, a closer inspection reveals that the conclusions ChatGPT reaches do not align with theoretical predictions. More to our point, neither of the “papers” cited actually exist.
Immediately after getting this nonsensical answer, we told ChatGPT that neither of the papers it cited exist and asked why it didn’t limit itself to discussing papers that exist. As shown below, it apologized, promised to provide a new summary of the research on asset perishability and price bubbles that only used existing papers, then proceeded to cite two more non-existent papers.
Tyler has called these errors “hallucinations” of ChatGPT. It might be whimsical in a more artistic pursuit, but we find this form of error concerning. Although there will always be room for improving language models, one thing is very clear: researchers be careful. This is something to keep in mind, also, when serving as a referee or grading student work.