Do People Trust ChatGPT Writing?

My new working paper with Will Hickman is up on SSRN: Do People Trust Humans More Than ChatGPT?

We study whether people will pay for a fact-check on AI writing. ChatGPT can be very useful, but human readers should not trust every fact that it reports. Yesterday’s post was about ChatGPT writing false things that look real.

The reason participants in our experiment might pay for a fact-check is that they earn bonus payments based on whether they correctly identify errors in a paragraph. If participants believe that the paragraph does not contain any errors, they should not pay for a fact-check. However, if they have doubts, it is rational to pay for a fact-check and earn a smaller bonus, for certain.

Abstract: We explore whether people trust the accuracy of statements produced by large language models (LLMs) versus those written by humans. While LLMs have showcased impressive capabilities in generating text, concerns have been raised regarding the potential for misinformation, bias, or false responses. In this experiment, participants rate the accuracy of statements under different information conditions. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT. However, when informed about authorship, participants show equal skepticism towards both human and AI writers. There is an increase in the rate of costly fact-checking by participants who are explicitly informed. These outcomes suggest that trust in AI-generated content is context-dependent.

Our original hypothesis was that people would be more trusting of human writers. That turned out to be only partially true. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT.

We presented information to participants in different ways. Sometimes we explicitly told them about authorship (informed treatment) and sometimes we asked them to guess about authorship (uninformed treatment).

This graph (figure 5 in our paper) shows that the overall rate of fact-checking increased when subjects were given more explicit information. Something about being told that a paragraph was written by a human might have aroused suspicion in our participants. (The kids today would say it is “sus.”) They became less confident in their own ability to rate accuracy and therefore more willing to pay for a fact-check. This effect is independent of whether participants trust humans more than AI.

We are thinking of fact-checking as often a good thing, in the context of our previous work on ChatGPT hallucinations. So, one policy implication is that certain types of labels can cause readers to think critically. For example, Twitter labels automated accounts so that readers know when content has been chosen or created by a bot.

Our working paper is currently trending on SSRN top ten lists such as this one.

Suggested Citation:
Buchanan, Joy and Hickman, William, Do People Trust Humans More Than ChatGPT? (November 16, 2023). GMU Working Paper in Economics No. 23-38, Available at SSRN: https://ssrn.com/abstract=4635674

GPT-4 Generates Fake Citations

I am happy to share my latest publication at The American Economist: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Citation: Buchanan, J., Hill, S., & Shapoval, O. (2024). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist. 69(1), 80-87  https://doi.org/10.1177/05694345231218454

Blog followers will know that we reported this issue earlier with the free version of ChatGPT using GPT-3.5 (covered in the WSJ). We have updated this new article by running the same prompts through the paid version using GPT-4. Did the problems go away with the more powerful LLM?

The error rate went down slightly, but our two main results held up. It’s important that any fake citations at all are being presented as real. The proportion of nonexistent citations was over 30% with GPT-3.5, and it is over 20% with our trial of GPT-4 several months later. See figure 2 from our paper below for the average accuracy rates. The proportion of real citations is always under 90%. GPT-4, when asked about a very specific narrow topic, hallucinates almost half of the citations (57% are real for level 3, as shown in the graph).

The second result from our study is that the error rate of the LLM increases significantly when the prompt is more specific. If you ask GPT-4 about a niche topic for which there is less training data, then a higher proportion of the citations it produces are false. (This has been replicated in different domains, such as knowledge of geography.)

What does Joy Buchanan really think?: I expect that this problem with the fake citations will be solved quickly. It’s very brazen. When people understand this problem, they are shocked. Just… fake citations? Like… it printed out reference for papers that do not actually exist? Yes, it really did that. We were the only ones who quantified and reported it, but the phenomenon was noticed by millions of researchers around the world who experimented with ChatGPT in 2023. These errors are so easy to catch that I expect ChatGPT will clean up its own mess on this particular issue quickly. However, that does not mean that the more general issue of hallucinations is going away.

Not only can ChatGPT make mistakes, as any human worker can mess up, but it can make a different kind of mistake without meaning to. Hallucinations are not intentional lies (which is not to say that an LLM cannot lie). This paper will serve as bright clear evidence that GPT can hallucinate in ways that detract from the quality of the output or even pose safety concerns in some use cases. This generalizes far beyond academic citations. The error rate might decrease to the point where hallucinations are less of a problem than the errors that humans are prone to make; however, the errors made by LLMs will always be of a different quality than the errors made by a human. A human research assistant would not cite nonexistent citations. LLM doctors are going to make a type of mistake that would not be made by human doctors. We should be on the lookout for those mistakes.

ChatGPT is great for some of the inputs to research, but it is not as helpful for original scientific writing. As prolific writer Noah Smith says, “I still can’t use ChatGPT for writing, even with GPT-4, because the risk of inserting even a small number of fake facts… “

Follow-Up Research: Will Hickman and I have an incentivized experiment on trust that you can read on SSRN: Do People Trust Humans More Than ChatGPT?

@IMurtazashvili has pointed me to a great resource for AI-era literature review work. “AI-Based Literature Review Tools” from Texas A&M

Wrapping Up & Sneak Peaks

I’m wrapping up grading for the semester. So this one is super short. What will I be writing about in the upcoming weeks. Here’s a sneak peak:

  1. I will read the course evaluations and let you know how my Game Theory Course changes fared.
  2. I’ll discuss a little bit of the new DID Stata methods. I’ll keep it short and sweet provide an example.
  3. I want to share some thoughts on objectivity, unreasonable academic charity, and our ability to interpret evidence using multiple models.
  4. Squeezing out more time efficiencies in your home life (Especially for parents)
  5. There are too many A’s in my Principles of Macroeconomics class.

These are what’s on the Horizon. I’ll link back here to stay on track. Have a great weekend!

The Greatest NBA Coach Is… Dan Issel?

Some economists love to write about sports because they love sports. Others love to write about sports because the data are so good compared to most other facets of the economy. What other industry constantly releases film of workers doing their jobs, and compiles and shares exhaustive statistics about worker performance?

This lets us fill the pages of the Journal of Sports Economics with articles on players’ performance and pay, and articles evaluating strategies that sometimes influence how sports are played in turn. But coaches always struck me as harder to evaluate than players or strategies. With players, the eye test often succeeds.

To take an extreme example, suppose an average high-school athlete got thrown into a professional football or basketball game; a fan asked to evaluate them could probably figure out that they don’t belong there within minutes, or perhaps even just by glancing at them and seeing they are severely undersized. But what if an average high school coach were called up to coach at the professional level? How long would it take for a casual observer to realize they don’t belong? You might be able to observe them mismanaging games within a few weeks, but people criticize professional coaches for this all the time too; I think you couldn’t be sure until you see their record after a season or two. Even then it is much less certain than for a player- was their bad record due to their coaching, or were they just handed a bad roster to work with?

The sports economics literature seems to confirm my intuition that coaches are difficult to evaluate. This is especially true in football, where teams generally play fewer than 20 games in a season; a general rule of thumb in statistics is that you need at least 20 to 25 observations for statistical tests to start to work. This accords with general practice in the NFL, where it is considered poor form to fire a coach without giving him at least one full season. One recent article evaluating NFL coaches only tries to evaluate those with at least 3 seasons. If the article is to be believed, it wasn’t until 2020 that anyone published a statistical evaluation of NFL defensive coordinators, despite this being considered a vital position that is often paid over a million dollars a year:

Continue reading

Where Can You Still Buy an Affordable Home in the US?

A few months ago I looked at the richest and poorest MSAs in the US, including adjusting for the cost of living in each MSA. One big thing I found was that the list doesn’t change that much when you adjust for the cost of living: San Jose, San Francisco, Bridgeport (CT), Boston, and Seattle are still the highest income MSAs even after accounting for the fact that they are also high-cost-of-living places to live. The gap shrinks, but they are still in the lead.

But that was adjusting for all the factors in the cost of living. But what if we just looked at one important aspect of the cost of living: housing. And since the cost-of-living adjustments (BEA’s RPP) that I was using are from 2021, what if we tried to bring the data up as close to the present as possible? We know that housing prices have increased a lot since 2021, but also that the cost of borrowing has risen dramatically too. What would this show us about the cost of living for different MSAs?

A tool from the Harvard Joint Center for Housing Studies allows us to make some pretty up-to-date comparisons. Their interactive map shows data for the 179 largest MSAs (about half of the total MSAs in the US) on the median price of each home for the second quarter of 2023 and uses interest rates from that quarter to show the rough principal and interest cost (assuming a 3.5% down payment). Taxes and insurance costs for each MSA are also estimated.

Based on those assumptions, their tool provides the minimum income you would need to purchase a home in that area, assuming a 31% debt-to-income ratio for the mortgage. And the income levels needed vary quite widely across MSAs, from a low of $44,000 in Cumberland, Maryland, to a high of over $500,000 in San Jose, CA. That’s a huge difference.

Of course, we know that incomes also vary across MSAs. But they don’t vary that much. The JCHS tool doesn’t provide this data (though a JCHS map from 2017 did compare house prices to incomes), but we can look up median family income for each MSA from Census. Doing so we see that San Jose is indeed unaffordable based on the current (2022) median income, which is “only” about $170,000. A nice income compared to the national median, but only about 1/3 of the $500,000 you would need to afford a home in San Jose. Cumberland looks much better though: median family income is over $77,000 there, about 76% more than you would need to buy a home!

What if we did a similar calculation for all MSAs in the JCHS data? The following map is my attempt to do so. Sorry, but my graphics skills are not the best, so this map isn’t as pretty as it could be (I started with the JCHS map, and just shaded in the colors I wanted to use). But I think it conveys the general idea.

Green-shaded MSAs are the most affordable: places like Cumberland, Maryland, where median family income is well above (at least 20% above, my arbitrary threshold) the amount JCHS says you need to buy a home. There are 27 Green-shaded MSAs. Blue-shaded MSAs are affordable too, and median income is between 100% and 120% of the amount needed to afford a home on the JCHS standard. There are 41 of these, making 68 total MSAs out of these 179 that are affordable. Red-shaded MSAs are less than 100%, and thus unaffordable (though as I will discuss below, some are much closer to affordable than others).

Continue reading

Charlie Munger’s Rule for a Happy Life

A big piece of news in the investment world has been the passing of Charlie Munger on Nov 28 at age 99. He was vice chair of Berkshire Hathaway, and Warren Buffett’s right-hand man there.

Munger grew up in Omaha, Nebraska, which is Warren Buffett’s hometown as well. They met at a dinner party there in 1959, and hit it off with one another personally.  Munger was a really smart guy. After joining the US Army Air Corps in1943, he scored highly on an intelligence test and was sent to study meteorology at Caltech. After the war he was accepted into Harvard Law School despite lacking a formal undergraduate degree, and graduated summa cum laude.

In his 50s, Munger lost his left eye after cataract surgery failed. A doctor warned he could lose his right eye too, so he began learning braille, but the condition improved.

He entered law practice, and eventually started his own firm, but he became more interested in investing. He racked up 19.8% annual returns investing on his own, between 1962 and 1975. Buffett convince Munger to give up law and join him as vice-chairman of Berkshire Hathaway in 1978.

Perhaps Buffett’s most famous investing saying is “It’s far better to buy a wonderful company at a fair price than a fair company at a wonderful price”. He credits this approach to Munger: “Charlie understood this early – I was a slow learner.”  Before being influenced here by Munger, Buffett had been more inclined to buy very low-priced shares in mediocre companies.  

Munger was heavily involved with Buffett’s decisions. “Berkshire Hathaway could not have been built to its present status without Charlie’s inspiration, wisdom and participation,” Buffett said following Munger’s death. That tribute is no overstatement: from the time Munger joined Berkshire Hathaway in 1978 till now, shares of the company soared 396,182% (i.e.,  $100 invested in Berkshire Hathaway in 1978 is worth $396,282 today). This performance dwarfs the 16,427% appreciation of the S&P 500 over the same time period. When he died, Munger was personally worth $2.6 billion.

(See more on Berkshire Hathaway’s formula for success at: Warren Buffett’s Secret Sauce: Investing the Insurance “Float” )

Quotations From Vice-Chairman Charlie

The internet is rife with sites displaying memorable or useful quotes from Charlie Munger. For example, “I never allow myself to have an opinion on anything that I don’t know the other side’s argument better than they do”;   and three rules for a career: “1) Don’t sell anything [to others] you wouldn’t buy yourself; 2) Don’t work for anyone you don’t respect and admire; and 3) Work only with people you enjoy.”

Some of these quote lists focus on sayings which provide guidance to individual investors, such as this from CNBC:

“I think you would understand any presentation using the word EBITDA, if every time you saw that word you just substituted the phrase,  ‘bull—- earnings.’ ″

The 2003 Berkshire shareholder meeting was one of the many occasions Munger called out what he saw as shady accounting practices, in this case EBITDA — a measure of corporate profitability short for earnings before interest, taxes, depreciation and amortization.

In short, Munger felt that companies often highlighted convoluted profitability metrics to obscure the fact that they were severely indebted or producing very little cash.

“There are two kinds of businesses: The first earns 12%, and you can take it out at the end of the year. The second earns 12%, but all the excess cash must be reinvested — there’s never any cash,” Munger said at the same meeting. “It reminds me of the guy who looks at all of his equipment and says, ‘There’s all of my profit.’ We hate that kind of business.”

To invest like Munger and Buffett, don’t fall for the flashiest numbers in the firms’ investor presentations. Instead, dig into a company’s fundamentals in their totality. The more a company or an investment advisor tries to win you over with esoteric terms, the more skeptical you should likely be.

As Buffett put it in his 2008 letter to shareholders: “Beware of geeks bearing formulas.”

Munger’s Secret to Happiness

Out of all these witty and helpful quotes, I’ll conclude by zeroing in on what Charlie Munger thought was the single most important factor in achieving personal happiness. He said it a number of different ways:

The secret to happiness is to lower your expectations. …that is what you compare your experience with. If your expectations and standards are very high and only allow yourself to be happy when things are exquisite, you’ll never be happy and grateful. There will always be some flaw. But compare your experience with lower expectations, especially something not as good, and you’ll find much in your experience of the world to love, cherish and enjoy, every single moment.

and

The world is not going to give you extra return just because you want it. You have to be very shrewd and hard working to get a little extra. It’s so much easier to reduce your wants. There are a lot of smart people and a lot of them cheat, so it’s not easy to win.

and finally:

A happy life is very simple. The first rule of a happy life is low expectations. That’s one you can easily arrange. And if you have unrealistic expectations, you’re going to be miserable all your life. I was good at having low expectations and that helped me. And also, when you [experience] reversals, if you just suck it in and cope, that helps if you don’t just stew yourself into a lot of misery.

Why I think we’ve hit peak pessimism

The key to successful public forecasting is to choose a subject that is too costly for your critics to formally measure. In keeping with such a spirit of low risk public posturing, I am hereby calling it: peak pessimism is now behind us. Which is not to say that people think things are fine, but rather that the gap between how things actually are (pretty good!) and how people think they are (kinda bad) is much smaller than the gap was six months ago (historically bad, even though they were pretty good then too!). The gloom of sunny days benighted by the goth-tinted glasses of an anxiety-serving media amplified by the terminally online is finally breaking.

For me, the real bellweather was the general non-response to a NYT article and Siena poll that said Biden was likely to lose to Trump head-to-head next November. Six months ago this would have received breathless coverage, with non-stop amplification on social media. What I observed instead was a lot of hand-waving and dismissal of an attempt for political panic clickbait.

So what’s my reasoning? In a nutshell, rational pessimism.

I’m a big believer in ecological rationality i.e. a lot of our seemingly irrational biases are actually relatively optimal behaviors when viewed in the long term for individual survival or cultural/group selection. Pessimism is an expressed preference for fewer negative surprises. From a households perspective, being surprised by a negative shock is far more dangerous to economic survival than being surprised by or even missing out on positive shocks. Choosing to rent intstead of buying a house in 2000 was, in hindsight, problematic, but not nearly so dangerous to your economic survival as buying a house in December of 2007. Not to get too Lamarckian on you, but it’s not crazy to say that the pandemic was such a (Knightian/Black Swan) shock to a lot of people that they updated their entire model of the economy to include the possibility of an entirely new kind of negative economic shock and, as a result, their new strategy is far more pessimistic. They very badly don’t want to be surprised again.

But that doesn’t mean they are done updating. At some point the good news is just too good to ignore. Employment is too good, wages are too good. New vaccines are too good. Climate data is…well that’s still pretty bad, but hey look, solar is happening! Good news, however, is an erosive force running against a freshly built wall of pessimism designed for the express purpose of protecting a household from the next negative shock. We shouldn’t be surprised if it takes a lot of good news a long time to break it down.

But it will break down. I’m not saying when it will break down, but the cracks are finally starting to show. Pessimism may be ecologically rational, but optimism always has an irresistible allure for those who don’t want to miss out. We’re starting to get the good news because people are starting to want it, even if only just a little bit. And media customers always get what they want.*


* Which is not to say that Fox News and similar outlets won’t remain consistently negative. Political and age-demographic demands for “everything is going to hell” aren’t going to change any time soon. They will also keep getting what they want.

Joy’s Fashion Globalization Article with Cato

I am published by Cato this week:

Fast Fashion, Global Trade, and Sustainable Abundance

This is part of a 10-part series called “Defending Globalization: Society and Culture

Imagine trying to explain the world today to a person who time traveled forward from 300 years ago. How could someone who lived in France in the year 1600 understand our modern problems?

Person from the Past: So, how is it with 8 billion people?

Me Today: It’s bad. We have too many clothes.

PftP: Right. With 8 billion you wouldn’t have enough clothes for everyone.

MT: Too many.

PftP: Not enough?

MT: I said we have TOO MANY clothes. Not even the poorest people in the world want them. Shirts pile up on the beaches and pollute the ocean.

PftP: …

My article highlights the fact that we live in an era of unprecedented clothing abundance. First, that was not always true.

Most of human history has been characterized by privation and low‐​productivity toil. As one American sharecropper exclaimed in John Steinbeck’s Depression‐​era novel The Grapes of Wrath, “We got no clothes, torn an’ ragged. If all the neighbors weren’t the same, we’d be ashamed to go to meeting.”

https://www.cato.org/publications/globalization-fashion

Secondly, not everyone is celebrating.

The United Nations Economic Commission for Europe called the fashion industry an “environmental and social emergency” because clothing production has roughly doubled since the year 2000. Their main concerns are fast fashion’s environmental impact and working conditions. 

Some of my article is a response to the critics of modern low-cost mass production.

Thirdly, I explain how we could keep most of the benefits of cheap clothes with less litter in the environment. The item I am most optimistic about is using our new artificial intelligence tools to re-sort the world’s junk. We would produce and throw away fewer clothes if we had a better system for rearranging the stock of goods that we already have. The problem I see today is that I have “perfectly good” clothes in my house that I don’t really want; however, attention and time are so scarce that no one will pay me for them. Even if I donate them, I worry that half will end up in the trash. Someone on this earth could use them but identifying that someone and making the trade still has high prohibitively high transaction costs. Very smart AI could come to my house and scan my stuff and pay me for it because very smart AI could get it to someone with a positive value for it.

If you’d like to see a trail of blogs that I wrote while in the research phase for this article, use https://economistwritingeveryday.com/?s=fashion

Lastly, we thank Tyler for the Marginal Revolution link.

House Rich, House Richer

The third quarter ‘All Transaction’ housing price data was just released this week. These numbers are interesting for a few of reasons. One reason is that home prices are a big component of our cost of living. Higher home prices are relevant to housing affordability. This week’s release is especially interesting because it’s starting to look like the Fed might be pausing its year 18-month streak of interest rates hikes. In case you don’t know, higher interest rates increase the cost of borrowing and decrease the price that buyers are willing to pay for a home. Nationally, we only had one quarter of falling home prices in late 2022, but the recent national growth rate in home prices is much slower than it was in 2021 through mid-2022.

Do you remember when there were a bunch of stories about remote workers and early retirees fleeing urban centers in the wake of Covid? We stopped hearing that story so much once interest rates started rising. The inflection point in the data was in Q2 of 2022. After that, price growth started slowing with the national average home price up 6.5%. But the national average masks some geographic diversity.  

Continue reading

OpenAI, IZA, and The Limits of Formal Power

Companies and non-profit organizations tend to be managed day-to-day by a CEO, but are officially run by a board with the legal power to replace the CEO and make all manner of changes to the company. But last week saw two striking demonstrations that corporate boards’ actual power can be much weaker than it is on paper.

The big headlines, as well as our coverage, focused on the bizarre episode where OpenAI, the one of the hottest companies (technically, non-profits) of the year, fired their CEO Sam Altman. They said it was because he was not “consistently candid with the board”, but refused to elaborate on what they meant by this; they said a few things it was not but still not what really motivated them.

Technically it is their call and they don’t have to convince anyone else, but in practice their workers and other partners can all walk away if they dislike the board’s decisions enough, leaving the board in charge of an empty shell. This was starting to happen, with the vast majority of workers threatening to walk out if the board didn’t reverse their decision, and their partner Microsoft ready to poach Sam Altman and anyone else who left.

After burning through two interim CEOs who lasted two days each, the board brought back ousted CEO Sam Altman. Formally, the big change was board member Ilya Sutskever switching sides, but the blowback was enough to get several board members to resign and agree to being replaced by new members more favored by the workers (including, oddly, economist Larry Summers).

A similar story played out at IZA last week, though it mostly went under the radar outside of economics circles. IZA (aka the Institute for Labor Economics) is a German non-profit that runs the world’s largest organization of labor economists. While they have a few dozen direct employees, what makes them stand out is their network of affiliated researchers around the world, which I had hoped to join someday:

Our global research network ist the largest in labor economics. It consists of more than 2,000 experienced Research Fellows und young Research Affiliates from more than 450 research institutions in the field.

But as with OpenAI, the IZA board decided to get rid of their well-liked CEO. Here at least some of their reasons were clear: they lost their major funding source and so decided to merge IZA with another German research institute, briq. Their big misstep was choosing for the combined entity to be run by the the much-disliked head of the smaller, newer merger partner briq (Armin Falk), instead of the well-liked head of the larger partner IZA (Simon Jaeger). Like with OpenAI, hundreds of members of the organization (though in this case external affiliates not employees, and not a majority) threatened to quit if the board went through with their decision. Like with OpenAI, this informal power won out as Armin Falk backed off of his plan to become IZA CEO.

Each story has many important details I won’t go into, and many potential lessons. But I see three common lessons between them. First is the limits to formal power; the board rules the company, but a company is nothing without its people, and they can leave if they dislike the board enough. Second, and following directly from this, is that having a good board is important. Finally, workers can organize very rapidly in the internet age. At OpenAI nearly all its employees signed onto the resignation threat within two days, because the organizers could simply email everyone a Google Doc with the letter. Organizers of the IZA letter were able to get hundreds of affiliates to sign on the same way despite the affiliates being scattered all across the world. In both cases there was no formal union threatening a strike; it was the simple but powerful use of informal power: the voice and threatened exit of the people, organized and amplified through the internet.