New Paper with Evidence that ChatGPT Hallucinates Nonexistent Citations

I posted a new working paper with systematic evidence for false citations when ChatGPT (GPT-3.5) writes about academic literature.

Buchanan, Joy and Shapoval, Olga, GPT-3.5 Hallucinates Nonexistent Citations: Evidence from Economics (June 3, 2023). Available at SSRN: https://ssrn.com/abstract=4467968 or http://dx.doi.org/10.2139/ssrn.4467968

Abstract: We create a set of prompts from every Journal of Economic Literature (JEL) topic to test the ability of a GPT-3.5 large language model (LLM) to write about economic concepts. For general summaries, ChatGPT can perform well. However, more than 30% of the citations suggested by ChatGPT do not exist. Furthermore, we demonstrate that the ability of the LLM to deliver accurate information declines as the question becomes more specific. This paper provides evidence that, although GPT has become a useful input to research production, fact-checking the output remains important.

Figure 2 in the paper shows the trend that the proportion of real citations goes down as the prompt becomes more specific. This idea has been noticed by other people, but I don’t think it has been documented quantitatively before.

We asked ChatGPT to cover a wide range of topics within economics. For every JEL category, we constructed three prompts with increasing specificity.

Level 1: The first prompt, using A here as an example, was “Please provide a summary of work in JEL category A, in less than 10 sentences, and include citations from published papers.”

Level 2: The second prompt was about a topic within the JEL category that was well-known. An example for JEL category Q is, “In less than 10 sentences, summarize the work related to the Technological Change in developing countries in economics, and include citations from published papers.”

Level 3: We used the word “explain” instead of “summarize” in the prompt, asking about a more specific topic related to the JEL category. For L we asked, “In less than 10 sentences, explain the change in the car industry with the rising supply of electric vehicles and include citations from published papers as a list. include author, year in parentheses, and journal for the citations.”

The paper is only 5 pages long, but we include over 30 pages in the appendix of the GPT responses to our prompts. If you are an economist who has not yet played with ChatGPT, then you might find it useful to scan this appendix and get a sense of what GPT “knows” about varies fields of economics.

If SSRN isn’t working for you, here is Also a Google Drive link to the working paper: https://drive.google.com/file/d/1Ly23RMBlim58a7CbmLwNL_odHSNRjC1L/view?usp=sharing

Previous iterations of this idea on EWED:

https://economistwritingeveryday.com/2023/04/17/chatgpt-as-intern/ Mike’s thoughts on what the critter is good for.

https://economistwritingeveryday.com/2023/01/21/chatgpt-cites-economics-papers-that-do-not-exist/  This is one of our top posts for traffic in 2023, since this is a topic of interest to the public.  That was January of 2023 and here we are in June today. It’s very possible that this problem will be fixed soon. We can log this bug now to serve as a benchmark of progress.

A check in and comparison with Bing:

Video of Joy Buchanan on Tech Jobs and Who Will Program

Here are some show notes for a keynote lecture to a general audience in Indiana. This was recorded in April 2023.

Minute Topic
2:00“SMET” vs STEM Education – Does Messaging Matter?  
(Previous blog post on SMET)
5:00Is Computer Programming a “Dirty Job”? Air conditioning, compensating differentials, and the nap pods of Silicon Valley  
(post on the 1958 BLS report)
7:50Wages and employment outlook for computer occupations
10:00Presenting my experimental research paper “Willingness to be Paid: Who Trains for Tech Jobs?” in 23 minutes  

Motivation and Background 10:00 – 15:30
Experimental Design         15:30 – 22:00
Results                    22:00 – 30:00
Discussion                 30:00 – 33:30
33:50Drawbacks to tech jobs  

See also my policy paper published by the CGO on tech jobs and employee satisfaction
35:30The 2022 wave of layoffs in Big Tech and vibing TikTok Product Managers  

I borrowed a graph on Tech-cession from Joey Politano and a blog point from Matt Yglesias, and of course reference the BLS.
39:00Should You Learn to Code? (and the new implications of ChatGPT)  

Ethan Mollick brought this Nature article to my attention. 
Tweet credits to @karpathy and @emollick
48:00Q&A with audience

Video: Joy Presents Two Experimental Papers to a Macro Class

Here are some show notes to a talk I gave in April 2023. I had the opportunity to talk to an undergraduate macroeconomics class at Indiana University East.

Minute Topic
2:00Research on Behavioral economics and Macroeconomics
4:25Labor Market Equilibrium Concepts and Incomplete Labor Contracts
6:50The Gift Exchange Game and the Fair Wage-Effort Theory
13:00Recessions and Downward Wage Rigidity
19:00Presenting my Experimental Study “If Wages Fell During a Recession” in 13 minutes
32:00-33:00How question raised in “If Wages Fell During a Recession” pointed the way to the Reference Point paper
33:00 – 41:00Presenting my Experimental Study “My Reference Point, Not Yours” in 8 minutes
41:00-44:00Conclusion of “My Reference Point, Not Yours” and tying it back to macroeconomics

The “If Wages Fell…” paper directly inspired the “My Reference…” experiment. But I don’t cite “If Wages Fell…” in “My Reference…,” so you would never know how closely they are connected unless you listen to this talk.

Health Insurance and Wages: Compensating Differentials in Reverse?

One of the oldest theories in economics is the idea of compensating differentials. A job represents not just a certain amount of money per hour, but a whole package of positive and negative things. Jobs have more or less stability, flexibility, fun, room to grow, danger… and non-cash benefits like health insurance. The idea of compensating differentials is that, all else equal, jobs that are good on these other margins can pay lower cash wages and still attract workers (thus, the danger of doing what you love). On the other hand, jobs that are bad on these other margins need high wages if they want to hire anyone (thus, the deadliest catch)

I think this theory makes perfect sense, and we see evidence for it in many places. But when it comes to health insurance, everything looks backwards. A job that offers employer-provided health insurance is better to most employees than one that doesn’t, so by compensating differentials it should be able to offer lower wages. There’s just one problem: US data shows that jobs offering health insurance also offer significantly higher wages. The 2018 Current Population Survey shows that workers with employer-provided health insurance had average wages of $33/hr, compared to $24/hr for those without employer insurance.

All the economists are thinking now: that’s not a problem, compensating differentials is an “all else equal” claim, but not all else is equal here. The jobs with health insurance pay higher wages because they are trying to attract higher-skilled workers than the jobs that don’t offer insurance.

That’s what I thought too. It is true that jobs with insurance hire quite different workers on average:

Source: 2017 CPS analyzed here

The problem is, once we control for all the observable ways that insured workers differ, we still find that their wages are significantly higher than workers who don’t get employer-provided insurance. Like, 10-20% higher. That’s after controlling for: year, sex, education, age, race, marital status, state of residence, health, union membership, firm size, whether the firm offers a pension, whether the employee is paid hourly, and usual hours worked. I’ve thrown in every possibly-relevant control variable I can think of and employer-provided health insurance always still predicts significantly higher wages. Of course, there are limits to what we get to observe about people using surveys; I don’t get any direct measures of worker productivity. Possibly the workers who get insurance are more skilled in ways I don’t observe.

We can try to account for these unobserved differences by following the same person from one job to another. When someone switches jobs, they could have health insurance in both jobs, neither, only the new, or only the old. What happens to the wages of people in each of these situations? It turns out that gaining health insurance in a new job on average brings the biggest increase in wages:

What could be going on here? One possibility is that health insurance makes people healthier, which improves their productivity, which improves their wages. But we control for health status and still find this effect. The real mystery is that papers that study mandatory expansions of health insurance (like the ACA employer mandate and prior state-level mandates) tend to find that they lower wages. Why would employer-provided health insurance lower wages when it is broadly mandated, but raise wages for individuals who choose to switch to a job that offers it?

My current theory is that “efficiency benefits” are offered alongside “efficiency wages”. The idea of efficiency wages is that some firms pay above-market wages as a way of reducing turnover. Workers won’t want to leave if they know their current job pays above-market, and so the company saves money on hiring and training. But this only works if other firms aren’t doing it. The positive correlation of wages and insurance could be because the same firms that pay “efficiency wages” are more likely to pay “efficiency benefits”- offering unusually good benefits as a way to hold on to employees.

I still feel like these results are puzzling and that I haven’t fully solved the puzzle. This post summarizes a currently-unpublished paper that Anna Chorniy and I have been working on for a long time and that I’ll be presenting at WVU tomorrow. We welcome comments that could help solve this puzzle either on the empirical side (“just control for X”) or the theoretical side (“compensating differentials are being overwhelmed here by X”).

More Ideas Pages

I’ve written here about my ideas page of economics papers I’d like to see.

After that post I heard from others who maintain similar pages. David Friedman has a small page here with research ideas, along with larger pages of short story ideas and product ideas.

HiveReview is a site where one can post or comment on both completed papers and paper ideas. The site does many things at once, but one use case is to post ideas in search of collaborators or to search for projects where someone wants a collaborator for their idea.

I learned today that Gwern Branwen maintains a large page of “Questions“, some of which could be research ideas, mostly outside of economics. He also has pages of research ideas and startup ideas. Some examples of Questions:

Given the crucial role of trust and shared interests in success stories like Xerox PARC or the Apollo Project or creative collaborations in general, why are there so few extremely successful pairs of identical twins?

Nicotine alternatives or analogues: there seem to be none, but why not?

Nicotine is one of the best stimulants on the market: legal, cheap, effective, relatively safe, with a half-life less than 6 hours. It also affects one of the most important and well-studied receptors. Why are there no attempts to develop analogues or replacements for nicotine which improve on it eg. by making it somewhat longer-lasting or less blood-pressure-raising, when there are so many variants on other stimulants like amphetamines or modafinil or caffeine?

Why States Hate Nursing Homes

Medicaid is a health insurance program for those with low incomes, funded largely by states. Overall it accounts for less than 20% of US medical spending. But there is one area where it is the dominant payer: nursing homes. Nursing homes are expensive, and Medicare (the typical insurance for those over 65) won’t cover them after the first hundred days, so most nursing home residents end up paying out of pocket until they burn through all their savings and wind up on Medicaid. At which point, Medicaid pays about $100,000 per year to the nursing home for the rest of their life.

States are responsible for up to half of that cost, and so start looking for ways to save money. One idea they have is to make it harder to build nursing homes: if there aren’t beds available, potential nursing home patients will have to stay home instead, where they can’t rack up Medicaid spending the same way. In fact, some states go all the way to a complete moratorium on new nursing homes:

Source: Institute for Justice

Some other states allow new nursing homes, but only with a special permission slip called a Certificate of Need (CON). CON is often required for other types of health facilities as well, like hospitals or dialysis centers. Research by me and others has generally found that CON doesn’t work as a way to reduce spending, and in fact actually increases it. CON might reduce the number of facilities, but that reduction of supply and competition gives the remaining facilities more power to raise prices.

So which effect dominates- does the smaller number of facilities reduce total spending, or do the higher prices increase it? It depends on the elasticity of demand:

In health care demand is typically quite inelastic, so the price effect dominates, and spending goes up:

But nursing homes could be an exception here. Elasticity of demand could be relatively high because of the number of potential substitutes- home care or assisted living for those with relatively low medical needs, hospitals for those with relatively high medical needs. Plus this is the one type of health care where Medicaid is the dominant payer. They could be especially resistant to price increases here, both due to their market power and their willingness to keep prices so low that facilities won’t take Medicaid patients (another way to save money!).

A new paper by Vitor Melo and Elijah Neilson finds that this is indeed the case. Indiana, Pennsylvania, and North Dakota repealed their nursing home CON requirements in the ’90s, and at least for IN and PA their Medicaid spending went way up. The paper uses a new “synthetic difference in difference” technique that seems appropriate, and creates figures that seem confusing at first but get a ton of information across:

They correctly note that they don’t evaluate the welfare effects of the policy; it’s possible that the extra nursing home beds following CON repeal bring huge benefits to seniors that are worth the higher spending. But nursing homes could be the exception to the general rule that CON fails to achieve the goals, like reduced spending, that advocates set for it.

Hospitals Just Got Easier to Build in West Virginia

West Virginia just repealed their Certificate of Need requirement for hospitals and birthing centers. Until now anyone wanting to open or expand a hospital needed to apply to a state board for permission. The process took time and money and could result in the board saying “no thanks, we don’t think the state needs another hospital”.

Now anyone wanting to open or expand a hospital and birthing center can skip this step and get to work. This means more facilities and more competition, which in turn leads to lower health care spending relative to trend.

Of course, the rest of West Virginia’s Certificate of Need requirements remain in place; if you want to open many other type of health care facilities, or purchase major equipment like an MRI, you must still get the state board to approve its “necessity”. In some cases, you shouldn’t even bother applying; West Virginia has a Moratorium on opioid treatment programs. Ideally West Virginia would join its neighbor Pennsylvania in a complete repeal of Certificate of Need requirements.

But making it easier to build hospitals and birthing centers is a major step. Hospitals are the largest single component of health spending in the US, and improved facilities might help reduce West Virginia’s infant mortality from its current level as the 4th worst state.

Update 4/7/23: A knowledgable correspondent suggests that the law may only allow existing hospitals to expand without CON (while totally new hospitals would still require one), citing this article. The text of the bill itself seems ambiguous to me. The section “Exemptions from certificate of need” adds “Hospital services performed at a hospital”. For birthing centers by contrast, new construction is clearly now allowed by right: exemptions from CON now include “Constructing, developing, acquiring, or establishing a birthing center”.

Discrepancy in Views about Music Pirating

It’s unusual for the expert opinions on an issue to range all the way from zero to 100%.

Economists using an instrumental variable approach found that digital piracy did not hurt record sales in the 2000’s. Hammond (2014) found, incredibly, that file-sharing increased record sales. The picture above is of an article critiquing the Oberholzer-Gee and Strumpf (2007) conclusion that was published by a top journal.

Liebowitz reports that music industry professionals believed that digital piracy was the primary or complete cause of the decline of record sales. One would think that industry insiders have accurate data on the problem and a decent mental model relating the variables together.

The estimated effect of music file-sharing ranged from helping music sales to completely eliminating them. Where else can we find so much disagreement on the answer to a narrow empirical question?

Regulatory Costs and Market Power

That’s the title of a blockbuster new paper by Shikhar Singla. The headline finding is that increased regulatory costs are responsible for over 30% of the increase in market power in the US since the 1990’s. That’s a big deal, but not what I found most interesting.

One big advance is simply the data on regulation. If you want to measure the effect of regulation on different industries, you need to come up with a way to measure how regulated they are. The crude, simple old approach is to count how many pages of regulation apply to a broad industry. The big advance of Mercatus’ RegData was to use machine learning to identify which specific industry is being discussed near “restrictive words” in the Code of Federal Regulation that indicate a regulatory restriction is being imposed. But not all regulatory words (even restrictive ones) are created equal; some impose very costly restrictions, most impose less costly restrictions, and some are even deregulatory. Singla’s solution is to take the government’s estimates of regulatory costs and apply machine learning there:

This paper uses machine learning on regulatory documents to construct a novel dataset on compliance costs to examine the effect of regulations on market power. The dataset is comprehensive and consists of all significant regulations at the 6-digit NAICS level from 1970-2018. We find that regulatory costs have increased by $1 trillion during this period.

The government’s estimates of the costs are of course imperfect, but almost certainly add information over a word-count based approach. Both approaches agree that regulation has increased dramatically over time. How does this affect businesses? Here’s what’s highlighted in the abstract:

We document that an increase in regulatory costs results in lower (higher) sales, employment, markups, and profitability for small (large) firms. Regulation driven increase in con- centration is associated with lower elasticity of entry with respect to Tobin’s Q, lower productivity and investment after the late 1990s. We estimate that increased regulations can explain 31-37% of the rise in market power. Finally, we uncover the political economy of rulemaking. While large firms are opposed to regulations in general, they push for the passage of regulations that have an adverse impact on small firms

More from the paper:

an average small firm faces an average of $9,093 per employee in our sample period compared to $5,246 for a large firm

a 100% increase in regulatory costs leads to a 1.2%, 1.4% and 1.9% increase in the number of establishments, employees and wages, respectively, for large firms, whereas it leads to 1.4%, 1.5% and 1.6% decrease in the number of establishments, employees and wages, respectively for small firms when compared within the state-industry-time groups. Results on employees and wages provide evidence that an increase in regulatory costs creates a competitive advantage for large firms. Large firms get larger and small firms get smaller.

The fact that large firms benefit while small firms are harmed is what drives the increase in concentration and market power.

What I like and dislike most about this paper is the same thing: its a much better version of what Diana Thomas and I tried to do in our 2017 Journal of Regulatory Economics paper. We used RegData restriction counts to measure how regulation affected the number of establishments and employees by industry, and how this differed by firm size. I wish I had thought of using published regulatory cost measures like Singla does, but realistically even if I had the idea I wouldn’t have had the machine learning chops to execute it. The push to quantify what “micro” estimates mean for economy-wide measures is also excellent. I hope and expect to see this published soon in a top-5 economics journal.

HT: Adam Ozimek

The ACA and Entrepreneurship: The Importance of Age

Thinking about one of my older papers today, since I just heard it won the Eckstein award for best paper in the Eastern Economic Journal in 2019 & 2020.

One big selling point of the Affordable Care Act was that by offering more non-employer-based options for health insurance, it would free people who felt locked into their jobs by the need for insurance. This would free people up to leave their jobs and do other things like start their own businesses. Did the ACA actually live up to this promise?

It did, at least for some people. The challenge when it comes to measuring the effect of the ACA is that it potentially affected everyone nationwide. If entrepreneurship rises following the implementation of the ACA in 2014, is it because of the ACA? Or just the general economic recovery? Ideally we want some sort of comparison group unaffected by the ACA. If that doesn’t really exist, we can use a comparison group that is less affected by it.

That’s what I did in a 2017 paper focused on younger adults. I compared those under age 26 (who benefit from the ACA’s dependent coverage mandate) to those just over age 26 (who don’t), but found no overall difference in how their self-employment rates changed following the ACA.

In the 2019 Eastern Economic Journal paper, Dhaval Dave and I instead consider the effect of the ACA on older adults. We compare entrepreneurship rates for people in their early 60’s (who might benefit from the availability of individual insurance through the ACA) with a “control group” of people in their late 60’s (who are eligible for Medicare and presumably less affected by the ACA). We find that the ACA led to a 3-4% increase in self-employment for people in their early 60’s.

Figure 1 from our 2019 EEJ paper

Why the big difference in findings across papers? My guess is that it’s about age, and what age means for health and health insurance. People in their 60s are old enough to have substantial average health costs and health insurance premiums, so they will factor health insurance into their decisions more strongly than younger people. In addition, the community rating provisions of the ACA generally reduced individual premiums for older people while raising them for younger people.

In sum, the ACA does seem to encourage entrepreneurship at least among older adults. At the same time, our other research finds that the employer-based health insurance system still leads Americans to stay in their jobs longer than they would otherwise choose to.