Do People Trust ChatGPT Writing?

My new working paper with Will Hickman is up on SSRN: Do People Trust Humans More Than ChatGPT?

We study whether people will pay for a fact-check on AI writing. ChatGPT can be very useful, but human readers should not trust every fact that it reports. Yesterday’s post was about ChatGPT writing false things that look real.

The reason participants in our experiment might pay for a fact-check is that they earn bonus payments based on whether they correctly identify errors in a paragraph. If participants believe that the paragraph does not contain any errors, they should not pay for a fact-check. However, if they have doubts, it is rational to pay for a fact-check and earn a smaller bonus, for certain.

Abstract: We explore whether people trust the accuracy of statements produced by large language models (LLMs) versus those written by humans. While LLMs have showcased impressive capabilities in generating text, concerns have been raised regarding the potential for misinformation, bias, or false responses. In this experiment, participants rate the accuracy of statements under different information conditions. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT. However, when informed about authorship, participants show equal skepticism towards both human and AI writers. There is an increase in the rate of costly fact-checking by participants who are explicitly informed. These outcomes suggest that trust in AI-generated content is context-dependent.

Our original hypothesis was that people would be more trusting of human writers. That turned out to be only partially true. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT.

We presented information to participants in different ways. Sometimes we explicitly told them about authorship (informed treatment) and sometimes we asked them to guess about authorship (uninformed treatment).

This graph (figure 5 in our paper) shows that the overall rate of fact-checking increased when subjects were given more explicit information. Something about being told that a paragraph was written by a human might have aroused suspicion in our participants. (The kids today would say it is “sus.”) They became less confident in their own ability to rate accuracy and therefore more willing to pay for a fact-check. This effect is independent of whether participants trust humans more than AI.

We are thinking of fact-checking as often a good thing, in the context of our previous work on ChatGPT hallucinations. So, one policy implication is that certain types of labels can cause readers to think critically. For example, Twitter labels automated accounts so that readers know when content has been chosen or created by a bot.

Our working paper is currently trending on SSRN top ten lists such as this one.

Suggested Citation:
Buchanan, Joy and Hickman, William, Do People Trust Humans More Than ChatGPT? (November 16, 2023). GMU Working Paper in Economics No. 23-38, Available at SSRN: https://ssrn.com/abstract=4635674

GPT-4 Generates Fake Citations

I am happy to share my latest publication at The American Economist: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Citation: Buchanan, J., Hill, S., & Shapoval, O. (2024). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist. 69(1), 80-87  https://doi.org/10.1177/05694345231218454

Blog followers will know that we reported this issue earlier with the free version of ChatGPT using GPT-3.5 (covered in the WSJ). We have updated this new article by running the same prompts through the paid version using GPT-4. Did the problems go away with the more powerful LLM?

The error rate went down slightly, but our two main results held up. It’s important that any fake citations at all are being presented as real. The proportion of nonexistent citations was over 30% with GPT-3.5, and it is over 20% with our trial of GPT-4 several months later. See figure 2 from our paper below for the average accuracy rates. The proportion of real citations is always under 90%. GPT-4, when asked about a very specific narrow topic, hallucinates almost half of the citations (57% are real for level 3, as shown in the graph).

The second result from our study is that the error rate of the LLM increases significantly when the prompt is more specific. If you ask GPT-4 about a niche topic for which there is less training data, then a higher proportion of the citations it produces are false. (This has been replicated in different domains, such as knowledge of geography.)

What does Joy Buchanan really think?: I expect that this problem with the fake citations will be solved quickly. It’s very brazen. When people understand this problem, they are shocked. Just… fake citations? Like… it printed out reference for papers that do not actually exist? Yes, it really did that. We were the only ones who quantified and reported it, but the phenomenon was noticed by millions of researchers around the world who experimented with ChatGPT in 2023. These errors are so easy to catch that I expect ChatGPT will clean up its own mess on this particular issue quickly. However, that does not mean that the more general issue of hallucinations is going away.

Not only can ChatGPT make mistakes, as any human worker can mess up, but it can make a different kind of mistake without meaning to. Hallucinations are not intentional lies (which is not to say that an LLM cannot lie). This paper will serve as bright clear evidence that GPT can hallucinate in ways that detract from the quality of the output or even pose safety concerns in some use cases. This generalizes far beyond academic citations. The error rate might decrease to the point where hallucinations are less of a problem than the errors that humans are prone to make; however, the errors made by LLMs will always be of a different quality than the errors made by a human. A human research assistant would not cite nonexistent citations. LLM doctors are going to make a type of mistake that would not be made by human doctors. We should be on the lookout for those mistakes.

ChatGPT is great for some of the inputs to research, but it is not as helpful for original scientific writing. As prolific writer Noah Smith says, “I still can’t use ChatGPT for writing, even with GPT-4, because the risk of inserting even a small number of fake facts… “

Follow-Up Research: Will Hickman and I have an incentivized experiment on trust that you can read on SSRN: Do People Trust Humans More Than ChatGPT?

@IMurtazashvili has pointed me to a great resource for AI-era literature review work. “AI-Based Literature Review Tools” from Texas A&M

The Greatest NBA Coach Is… Dan Issel?

Some economists love to write about sports because they love sports. Others love to write about sports because the data are so good compared to most other facets of the economy. What other industry constantly releases film of workers doing their jobs, and compiles and shares exhaustive statistics about worker performance?

This lets us fill the pages of the Journal of Sports Economics with articles on players’ performance and pay, and articles evaluating strategies that sometimes influence how sports are played in turn. But coaches always struck me as harder to evaluate than players or strategies. With players, the eye test often succeeds.

To take an extreme example, suppose an average high-school athlete got thrown into a professional football or basketball game; a fan asked to evaluate them could probably figure out that they don’t belong there within minutes, or perhaps even just by glancing at them and seeing they are severely undersized. But what if an average high school coach were called up to coach at the professional level? How long would it take for a casual observer to realize they don’t belong? You might be able to observe them mismanaging games within a few weeks, but people criticize professional coaches for this all the time too; I think you couldn’t be sure until you see their record after a season or two. Even then it is much less certain than for a player- was their bad record due to their coaching, or were they just handed a bad roster to work with?

The sports economics literature seems to confirm my intuition that coaches are difficult to evaluate. This is especially true in football, where teams generally play fewer than 20 games in a season; a general rule of thumb in statistics is that you need at least 20 to 25 observations for statistical tests to start to work. This accords with general practice in the NFL, where it is considered poor form to fire a coach without giving him at least one full season. One recent article evaluating NFL coaches only tries to evaluate those with at least 3 seasons. If the article is to be believed, it wasn’t until 2020 that anyone published a statistical evaluation of NFL defensive coordinators, despite this being considered a vital position that is often paid over a million dollars a year:

Continue reading

House Rich, House Richer

The third quarter ‘All Transaction’ housing price data was just released this week. These numbers are interesting for a few of reasons. One reason is that home prices are a big component of our cost of living. Higher home prices are relevant to housing affordability. This week’s release is especially interesting because it’s starting to look like the Fed might be pausing its year 18-month streak of interest rates hikes. In case you don’t know, higher interest rates increase the cost of borrowing and decrease the price that buyers are willing to pay for a home. Nationally, we only had one quarter of falling home prices in late 2022, but the recent national growth rate in home prices is much slower than it was in 2021 through mid-2022.

Do you remember when there were a bunch of stories about remote workers and early retirees fleeing urban centers in the wake of Covid? We stopped hearing that story so much once interest rates started rising. The inflection point in the data was in Q2 of 2022. After that, price growth started slowing with the national average home price up 6.5%. But the national average masks some geographic diversity.  

Continue reading

Are You Better Off Than You Were Four Years Ago?

In the October 1980 Presidential debate, Ronald Reagan famously asked that question to the American voters. His next sentence made it clear he was talking about the relationship between prices and wages, or what economists call real wages: “is it easier for you to go and buy things in the stores than it was four years ago?”

Reagan was a master of political rhetoric, so it’s not surprising that many have tried to copy his question in the years since 1980. For example, Romney and Ryan tried to use this phrase in their 2012 campaign against Obama. But it’s a good question to ask! While the President may have less control over the economy than some observers think, the economy does seem to be a key factor in how voters decide (for example, Ray Fair has done a pretty good job of predicting election outcomes with a few major economic variables).

Voters in 2024 will probably be asking themselves a similar question, and both parties (at least for now) seem to be actively encouraging voters to make such a comparison. We still have 12 months of economic data to see before we can really ask the “4 years” question, but how would we answer that question right now? Here’s probably the best approach to see if people are “better off” in terms of being able to “go and buy things at the stores”: inflation-adjusted wages. This chart presents average wages for nonsupervisory workers, with two different inflation adjustments, showing the change over a 4-year time period.

Continue reading

Growth of the Transfer State

I’ve written about government spending before. But not all spending is the same. Building a bridge, buying a stapler, and taking from Peter to pay Paul are all different types of spending. I want to illustrate that last category. Anytime that the government gives money to someone without purchasing a good or service or making an interest payment, it’s called a ‘transfer’. People get excited about transfers. Social security is a transfer and so is unemployment insurance benefits. Those nice covid checks? Also transfers.

Here I’ll focus on Federal transfers, though the data on all transfers is very similar if you include states in the analysis. Let’s start with the raw numbers. Below is data on GDP, Federal spending, and federal transfers. Suffice it to say that they are bigger than they used to be. They’ve all been growing geometrically and they all exhibit bumps near recessions.

Continue reading

Let’s Be Thankful for Food Abundance

Despite recent increases in prices of food, we should still all be very thankful this Thanksgiving for the abundance of affordable food available in the modern world. Looking back at my past few blog posts, I notice that I have been very food-centric in my choice of topics! And last week I also showed how the Thanksgiving meal this year will be the second cheapest ever (only behind 2019). While it’s absolutely true that food prices are up a lot in the past 2 and 4 years, they probably aren’t up as much as you have heard.

It’s always my preference to take as long-term perspective as possible when thinking about economic progress. So here’s the best way I’ve come up with to show how cheap and abundant food is today: food as a share of household spending fell dramatically in the 20th century.

Most of the data in this chart comes from the BLS Consumer Expenditure Surveys. This survey was done occasionally since 1901, and then annually since 1984. I also use BEA data to estimate personal taxes paid as a percent of spending (the CEX Surveys have some tax data, but it’s not reliable nor consistent). I picked as close to 30-year intervals as I could (with a preference for showing the earliest and latest years available), and I chose spending categories that are 90-100% of total expenditures in most of these years. Keep in mind also that these are consumer expenditures. As a nation, we spend a lot more on healthcare and education than this chart suggests, but most of that spending is not directly from households (of course, it is indirectly). Think of this chart as an average household budget.

I hope the thing that jumps out at you is that the amount money households spend on food has fallen dramatically since 1901, from over 42 percent to under 13 percent of household expenditures. To be clear, this data includes both spending on food at home and at restaurants (after 1984 we can track them separately, and groceries are pretty consistently about 60 percent of food spending). And you may be wondering about very recent trends too, such as before the pandemic. In 2022, household spent slightly less on food than they did in 2019, falling from 13.5 to 12.8%.

You may also notice that taxes have increased, though not much since 1960. Housing cost have been consistently high, and also a bit higher than 1990, going from 27 percent to 33 percent in 2022. And housing is now the single largest budget expenditure category, but for most of the first half of the 20th century, it was food that was the largest. And since people aren’t changing their housing situation more than once a year (if that), it would also have been food that dominated weekly and monthly budget decisions and worry about price fluctuations.

This year there will be lots of complaining about prices around the Thanksgiving table. And much of that is warranted! But let’s also be thankful on this food-intensive holiday for how cheap the food is.

And if some smart-aleck youngster tries to tell you that they learned on TikTok that things were better during the Great Depression (yes, people are really saying this!), have them watch this video by Christopher Clarke. Or show them that in the mid-1930s an average family spent one-third of their budget on food in my chart above, or how much labor it would have taken to buy that turkey in the 1930s (about 40 times as much time spent working as today).

Delinquency Data

I keep reading and hearing people who are waiting for the shoe to drop on the next recession. They see high interest rates and… well, that’s what they see. Employment is ok and NGDP is chugging along.

One indicator of economic trouble is the delinquency rate on debt. That’s exactly what we would expect if people lose their job or discover that they are financially overextended. They’d fail to meet their debt obligations. But the broad measure of commercial bank loans is quiet. Not only is it quiet, it’s near historic lows in the data at only 1.25% in 2023Q2. Banks can lend with a confidence like never before.

But maybe that overall delinquency rate is obscuring some compositional items. After all, we know that many recessions begin with real-estate slowdowns. Below are the rates for commercial non-farmland loans, farmland loans, and residential mortgages. All are near historical lows, though there are hints that they’re might be on the rise. But one quarter doesn’t a recession make. I won’t show the graph for the sake of space, but all business loan delinquency rates have also been practically flat for the past five years.

Continue reading

Replication Funding for Development Economics

The RWI − Leibniz Institute for Economic Research has funding for researchers to replicate papers in development economics:

RWI invites applications for several positions of Replicator on a self-employed basis to conduct a robustness replication of a published microeconomic study in the field of Development Economics. The successful applicant will work with us on the project “Robustness and Replicability in Economics (R2E)”, funded by the German Science Foundation (DFG) Priority Programme “Meta-Rep”….

The ultimate goal is to contribute to the ongoing debate about replicability and replication rates in eco- nomics. We collaborate closely with the Institute for Replication (I4R). All robustness replications will contribute to a meta-paper summarizing the collective findings. We plan to publish this meta-paper by the end of 2024, and all replication fellows will be co-authors….

The position starts as soon as possible and is limited to six months. The work can be done fully remotely. The applicant will receive compensation of 2,500 € gross in total, possible distributed in installments based upon predetermined deliverables. Additionally, replication fellows will be listed as co-authors on the meta-paper. At the conclusion of the project, it is foreseen to gather all fellows for a final workshop at RWI in Essen, Germany.

I don’t know the team here but I’m always happy to see more attempts to make economic research more reliable. The funding and the planned publication make this potentially a good deal for applied microeconomists, especially grad students. Full details are here (warning: PDF).

Food Prices Are Up, But Let’s not Overstate How Much

Last week I gave some advice on how to save money on food. Food prices are up a lot in the past 4 years, but especially since the beginning of 2021. Over the 32 months since January 2021, grocery prices (according to the CPI) are up 20 percent (keep that number in mind). To give you an idea of how unusual that is, in the 32 months before the pandemic (up to January 2020), grocery prices only rose 2 percent. Perhaps even more astonishingly, if we look at October 2019 grocery prices, they were slightly lower on average than 4 years earlier in October 2015. From a flat 4 years to a 25 percent increase over the next 4 years. That’s a huge change for consumers.

But we also shouldn’t overstate the price increases. As you might guess, the best place for overstatements is social media. You can find plenty of them. For example, this very viral video claims that her family’s grocery prices doubled (in fact, almost exactly doubled, to the penny, which is suspicious) in just one single year, from August 2021 to August 2022. According to the CPI data, grocery prices were up 13.5 percent over that period — which, don’t get me wrong, is a lot! But it’s not 100 percent. I’ll focus on this one example, but I’m sure you will believe me that you can find dozens of examples like this on social media every single day (for example, yesterday someone claimed bread prices had tripled since 2019).

Let’s leave aside for a moment that in that viral video she claims to spend $1,500 per month on groceries. This would be a massive outlier for 2022. A family in the middle income quintile spent $460 per month on groceries in 2022, and $713 on all food including restaurants. So even if this family eats every single meal at home, they are still spending twice as much as a middle income family. Even a family with 5 or more people (the largest bucket BLS uses in that report) spent $755 per month on groceries ($1,232 on all food). According to the Consumer Expenditure survey, the middle quintile grocery spending went up 16%, and the five-person household went up 19% from 2021 to 2022. Big increases, no doubt! But not 100%.

So who are we to believe? Have prices roughly doubled since 2021? Or are they up about 20 percent? People are sometimes skeptical of the consumer price index, so let’s look at the actual price data that goes into the index. BLS has data on hundreds of individual food items, but here’s a summary chart with eight common food items. Here’s the change in the prices of those items since January 2021:

Continue reading