Trusting ChatGPT at JBEE

You can find my paper with Will Hickman “Do people trust humans more than ChatGPT?” at the Journal of Behavioral and Experimental Economics (JBEE) online, and you can download it free before July 30, 2024 (temporarily ungated*).

*Find a previous ungated draft at SSRN.

Did we find that people trust humans more than the bots? It’s complicated. Or, as we say in the paper, it’s context-dependent.

When participants saw labels informing them (e.g. “The following paragraph was written by a human.”) about authorship, readers were more likely to purchase a fact-check (the orange bar).

Informed subjects were not more trusting of human authors versus ChatGPT (so we couldn’t reject the null hypothesis about trusting humans, in that sense). However, Informed subjects were significantly less likely to trust their own judgement of the factual accuracy of the paragraph in the experiment, relative to readers who saw no authorship labels.

Some regulations would make the internet more like our Informed treatment. The EU may mandate that ChatGPT comply with the obligation of: “Disclosing that the content was generated by AI.” Our results indicate that this policy would affect behavior because people read differently when they are forced to think up front about how the text was generated.

Inspiration for this article on trust began with observing the serious errors that can be produced by LLMS (e.g. make up fake citations). Our hypothesis was that readers are more trusting of human authors, because of these known mistakes by ChatGPT. This graph shows that participants trust (left blue bar = “High Trust”) statements *believed* to have been written by a human (so, in that sense, our main hypothesis has some confirmation).

Conversely, in the Informed treatment, readers are equally uncertain about text written either by humans or bots. Informed readers are suspicious, so they buy a fact-check. “High Trust” (the blue bar) is the option that maximizes expected value if the reader thinks the author has not made factual errors.

So, in conclusion, we find that human readers can be made more suspicious by framing. In this case, we are thinking of being cautious and doing a fact-check as a good thing. The reason is that, increasingly, the new texts of society are being written by LLMs. Evidence of this fact has been presented by Andrew Gray in a 2023 working paper: “ChatGPT “contamination”: estimating the prevalence of LLMs in the scholarly literature” Note that is the scholarly literature, not just the sports blogs or the Harry Potter – Taylor Swift- crossover fanfics.

What about the medical doctors? What is the authority on whether you are getting surgery or not? See: “Delving into PubMed Records: Some Terms in Medical Writing Have Drastically Changed after the Arrival of ChatGPT”

“Delve” was a brief Twitter(X) joke, and Will Hickman has a nice chart on how “delve” crept into the econ literature as soon as LLM tools became available:

https://x.com/williamhickman_/status/1774097806224920852

I think most people would like to know if a medical journal article was written primarily (or entirely) by an LLM. Our experiment indicates that people will respond differently to text when it is labeled by authorship.

Something I’ve been meaning to write a separate post on for a while but don’t have time: The problem in the world is not a lack of good writers. It’s a lack of good readers. Related: Academic economists are overcommitted

Lastly, like my other paper on ChatGPT, this one will be a product of it’s time, the year 2023. I’m interested to see how the results of an experiment like this would evolve in the year 2033.

A prospective co-author of mine has told me this year that he is very concerned about writing a dated LLM paper. He’s afraid the tools will evolve fast enough to make findings gathered in 2024 obsolete. I think we should write the papers anyway. They will serve as useful benchmarks if indeed the technology evolves quickly.

Leave a comment