I posted a new working paper with systematic evidence for false citations when ChatGPT (GPT-3.5) writes about academic literature.
Buchanan, Joy and Shapoval, Olga, GPT-3.5 Hallucinates Nonexistent Citations: Evidence from Economics (June 3, 2023). Available at SSRN: https://ssrn.com/abstract=4467968 or http://dx.doi.org/10.2139/ssrn.4467968
Abstract: We create a set of prompts from every Journal of Economic Literature (JEL) topic to test the ability of a GPT-3.5 large language model (LLM) to write about economic concepts. For general summaries, ChatGPT can perform well. However, more than 30% of the citations suggested by ChatGPT do not exist. Furthermore, we demonstrate that the ability of the LLM to deliver accurate information declines as the question becomes more specific. This paper provides evidence that, although GPT has become a useful input to research production, fact-checking the output remains important.
Figure 2 in the paper shows the trend that the proportion of real citations goes down as the prompt becomes more specific. This idea has been noticed by other people, but I don’t think it has been documented quantitatively before.
We asked ChatGPT to cover a wide range of topics within economics. For every JEL category, we constructed three prompts with increasing specificity.
Level 1: The first prompt, using A here as an example, was “Please provide a summary of work in JEL category A, in less than 10 sentences, and include citations from published papers.”
Level 2: The second prompt was about a topic within the JEL category that was well-known. An example for JEL category Q is, “In less than 10 sentences, summarize the work related to the Technological Change in developing countries in economics, and include citations from published papers.”
Level 3: We used the word “explain” instead of “summarize” in the prompt, asking about a more specific topic related to the JEL category. For L we asked, “In less than 10 sentences, explain the change in the car industry with the rising supply of electric vehicles and include citations from published papers as a list. include author, year in parentheses, and journal for the citations.”
The paper is only 5 pages long, but we include over 30 pages in the appendix of the GPT responses to our prompts. If you are an economist who has not yet played with ChatGPT, then you might find it useful to scan this appendix and get a sense of what GPT “knows” about varies fields of economics.
If SSRN isn’t working for you, here is Also a Google Drive link to the working paper: https://drive.google.com/file/d/1Ly23RMBlim58a7CbmLwNL_odHSNRjC1L/view?usp=sharing
Previous iterations of this idea on EWED:
https://economistwritingeveryday.com/2023/04/17/chatgpt-as-intern/ Mike’s thoughts on what the critter is good for.
https://economistwritingeveryday.com/2023/01/21/chatgpt-cites-economics-papers-that-do-not-exist/ This is one of our top posts for traffic in 2023, since this is a topic of interest to the public. That was January of 2023 and here we are in June today. It’s very possible that this problem will be fixed soon. We can log this bug now to serve as a benchmark of progress.
A check in and comparison with Bing: