Sorry, you caught me between critical masses

I’m on Bluesky. I’m on twitter/X. I’m not happy with either right now. I wasn’t particularly happy on twitter before, but that was before it became much worse, so now I wish it could back to the way it was, when I was also complaining, because it turns out the counterfactual universe where it was different is actually worse. So here we are.

The decline in my personal portfolio of social media largely comes down to critical mass. The decline in Twitter usage has reduced its value to me (and most of its users). Only a tiny fraction of this loss in Twitter value is offset by the value I receive from Bluesky for the simple reason it doesn’t have enough users. Even if 100% of Twitter exits had led to Bluesky entrants, it would still be a value loss because the marginal user currently offers more value at Twitter. Standard network goods, returns to scale, power law mechanics, yada yada yada.

Now, to be clear, Twitter is still well above the minimum critical mass threshold for significant value-add, but the good itself has also been damaged by Elon’s managerial buffoonery. Bluesky, depending on your point of view and consumer niche, hasn’t achieved a self-sustaining critical mass (e.g. econsky hasn’t quite cracked it, unfortunately). The result is a decent number of people half-committing to both, which only serves to undermine consumer value being generated in the entire “microblogging” social media space.

The problem, simply put, is that Twitter still has too much option value to leave entirely. If (when) Elon get’s the mother-of-all-margin-calls, he’ll likely have to sell Twitter or large amount of his Tesla holdings. If he’s smart and doesn’t cave in to the sunk cost fallacy (a non-trivial “if”), he’ll sell Twitter. If new ownership successfully returns Twitter to suitable fascimile of it’s previous form, people will come flooding back, Bluesky will turn-off or wholly adapt into a new consumer paradigm, and everyone will be thrilled to have squatted on their previous accounts.

If Twitter retains its current form, then it will probably die, though not at the direct hands of Bluesky. More likely it will be displaced by some new product most of us don’t yet see coming, as the next generation departs twitter the way millenials departed Facebook for Snapchat and, eventually, TikTok. Perhaps counterintuitively, this outcome is actually excellent for Bluesky, because the absence of twitter will send the 3% of “professional” twitter users (economists, journalists, thinktank wonks, policy makers, etc) to Bluesky, where they will achieve niche critical mass and live happily ever after (at least as happy as one can be whilst immersed in a sea of status-obsessed try-hards).

But for the moment, we’re all a little stuck trying to make do with finding fulfillment in the complex personal lives, loving families, transcendant art, and multidimensional experiences that remain confined to meatspace. We can only do our best and remain strong during such trying times.

Chapman Economists Revise Forecast

Back in June, I watched the livestream of the Chapman Economic Forecast with Dr. Jim Doti (who was president when I was a student at Chapman). Typically, this is a valuable informative event, and the team has an excellent record of performance. They have often outdone other forecasters in predicting the future.

That is why I feel a little bad for making this post in the summer and tweeting out Doti’s prediction that we would have a recession by now.

To be fair to Doti, there has been a lot of uproar over this issue. Lots of people thought the economy would be bad. And lots of people feel like the economy is bad (the “vibecession”) even though it is objectively not. Many tweets have gone by about it.

Doti opened by saying his prediction had turned out to be wrong. He had an explanation for it (pictured below). You can watch it free here (recorded on Dec 14).

Doti said that he had expected a large fiscal stimulus in the form of deficit spending, however he had not expected the deficit to be so large. Debt-financed spending propped up an economy that was otherwise poised to contract. At least, that is a plausible story.

Looking forward, Doti does not predict a recession next year, but he does predict weak growth and possibly one quarter of GPD decline (not two).

The next part of talk was about the long-term consequences of deficit spending. Nothing is free. TANSTAAFL

In addition to vibecession, anyone following economics in 2023 needs to know what a “soft landing” is.

Update on Game Theory Teaching

I wrote at the end of the summer about some changes that I would make to my Game Theory course. You can go back and read the post. Here, I’m going to evaluate the effectiveness of the changes.

First, some history.

I’ve taught GT a total of 5 time. Below are my average student course evaluations for “I would recommend this class to others” and “I would consider this instructor excellent”. Although the general trend has been improvement, improving ratings and the course along the way, some more context would be helpful. In 2019, my expectations for math were too high. Shame on me. It was also my first time teaching GT, so I had a shaky start. In 2020, I smoothed out a lot of the wrinkles, but I hadn’t yet made it a great class. 

In 2021, I had a stellar crop of students. There was not a single student who failed to learn. The class dynamic was perfect and I administered the course even more smoothly. They were comfortable with one another, and we applied the ideas openly. In 2022, things went south. There were too many students enrolled in the section, too many students who weren’t prepared for the course, and too many students who skated by without learning the content. Finally, in 2023, the year of my changes, I had a small class with a nice symmetrical set of student abilities.  

Historically, I would often advertise this class, but after the disappointing 2022 performance, and given that I knew that I would be making changes, I didn’t advertise for the 2023 section. That part worked out perfectly. Clearly, there is a lot of random stuff that happens that I can’t control. But, my job is to get students to learn, help the capable students to excel, and to not make students *too* miserable in the process – no matter who is sitting in front of me.

Continue reading

National Health Expenditure Accounts Historical State Data: Cleaned, Merged, Inflation Adjusted

The government continues to be great at collecting data but not so good at sharing it in easy-to-use ways. That’s why I’ve been on a quest to highlight when independent researchers clean up government datasets and make them easier to use, and to clean up such datasets myself when I see no one else doing it; see previous posts on State Life Expectancy Data and the Behavioral Risk Factor Surveillance System.

Today I want to share an improved version of the National Health Expenditure Accounts Historical State Data.

National Health Expenditure Accounts Historical State Data: The original data from the Centers for Medicare and Medicaid Services on health spending by state and type of provider are actually pretty good as government datasets go: they offer all years (1980-2020) together in a reasonable format (CSV). But it comes in separate files for overall spending, Medicare spending, and Medicaid spending; I merge the variables from all 3 into a single file, transform it from a “wide format” to a “long format” that is easier to analyze in Stata, and in the “enhanced” version I offer inflation-adjusted versions of all spending variables. Excel and Stata versions of these files, together with the code I used to generate them, are here.

A warning to everyone using the data, since it messed me up for a while: in the documentation provided by CMMS, Table 3 provides incorrect codes for most variables. I emailed them about this but who knows when it will get fixed. My version of the data should be correct now, but please let me know if you find otherwise. You can find several other improved datasets, from myself and others, on my data page.

State Tax Revenue is Down a Lot in 2023 (but really just back to normal levels)

State tax revenue is down a lot since last year. The latest comparable data from Census’s QTAX survey is for the 2nd quarter of 2023, and it shows a massive hit: state tax revenue was down 14% from the same quarter in 2022, which is about $66 billion. Almost all of that decline is from income tax revenue, specifically individual income tax revenue which is down over 30% (almost $60 billion). General sales taxes, the other workhorse of state budgets, is essentially flat over the year.

That’s a huge revenue decline! So, what’s going on? In some states, there has been an attempt to blame recent tax cuts. It’s not a bad place to start, since half of US states have reduced income taxes in the past 3 years, mostly reducing top marginal tax rates. But that can’t be the full explanation, since almost every state saw a reduction in revenue: just 3 states had individual income tax revenue increases (Louisiana, Mississippi, and New Hampshire) from 2022q2 to 2023q2, and they were among the half of states that reduced rates!

To get some perspective let’s look at long-run trends. This chart shows total state individual income tax revenue for all 50 states (sorry, DC) going back to 1993. I use a 4-quarter total, since tax receipts are seasonal (and because states sometimes move tax deadlines due to things like disasters, a specific quarter can sometimes look weird). And importantly, this data is not inflation adjusted. Don’t worry, I will do an adjustment further below in this post, but for starters let’s just look at the nominal dollars, because nominal dollars are how states receive money!

Continue reading

Former Treasury Official Defends Decision to Issue Short Term Debt for Pandemic;  I’m Not Buying It

We noted earlier (see “The Biggest Blunder in The History of The Treasury”: Yellen’s Failure to Issue Longer-Term Treasury Debt When Rates Were Low ), along with many other observers, that it seemed like a mistake for the Treasure to have issued lots of short-term (e.g. 1-2 year) bonds to finance the sudden multi-trillion dollar budget deficit from the pandemic-related spending surge in 2020-2021. Rates were near-zero (thanks to the almighty Fed) back then.

Now, driven by that spending surge, inflation has also surged, and thus the Fed has been obliged to raise interest rates. And so now, in addition to the enormous current deficit spending,  that tsunami of short-term debt from 2020-2021 is coming due, to be refinanced at much higher rates. This high interest expense will contribute further to the growing government debt.

Hedge fund manager Stanley Druckenmiller  commented in an interview:

When rates were practically zero, every Tom, Dick and Harry in the U.S. refinanced their mortgage… corporations extended [their debt],” he said. “Unfortunately, we had one entity that did not: the U.S. Treasury….

Janet Yellen, I guess because political myopia or whatever, was issuing 2-years at 15 basis points[0.15%]   when she could have issued 10-years at 70 basis points [0.70 %] or 30-years at 180 basis points [1.80%],” he said. “I literally think if you go back to Alexander Hamilton, it is the biggest blunder in the history of the Treasury. I have no idea why she has not been called out on this. She has no right to still be in that job.

Unsurprisingly, Yellen pushed back on this charge (unconvincingly). More recently, former Treasury official Amar Reganti has issued a more detailed defense. Here are some excerpts of his points:

( 1 ) …The Treasury’s functions are intimately tied to the dollar’s role as a reserve currency. It is simply not possible to have a reserve currency without a massive supply of short-duration fixed income securities that carry no credit risk.

( 2 ) …For the Treasury to transition the bulk of its issuance primarily to the long end of the yield curve would be self-defeating since it would most likely destabilise fixed income markets. Why? The demand for long end duration simply does not amount to trillions of dollars each year. This is a key reason why the Treasury decided not to issue ultralong bonds at the 50-year or 100-year maturities. Simply put, it did not expect deep continued investor demand at these points on the curve.

( 3 ) …The Treasury has well over $23tn of marketable debt. Typically, in a given year, anywhere from 28% to 40% of that debt comes due…so as not to disturb broader market functioning, it would take the Treasury years to noticeably shift its weighted average maturity even longer.

( 4 ) …The Treasury does not face rollover risk like private sector issuers.

Here is my reaction:

What Reganti says would be generally valid if the trillions of excess T-bond issuance in 2020-2021 were sold into the general public credit market. In that case, yes, it would have been bad to overwhelm the market with more long-term bonds than were desired.  But that is simply not what happened. It was the Fed that vacuumed up nearly all those Treasuries, not the markets. The markets were desperate for cash, and hence the Fed was madly buying any and every kind of fixed income security, public and corporate and mortgage (even junk bonds that probably violated the Fed’s bylaws), and exchanging them mainly for cash.  Sure, the markets wanted some short-term Treasuries as liquid, safe collateral, but again, most of what the Treasury issued ended up housed in the Fed’s digital vaults.

So, I remain unconvinced that the issuance of mainly long-term (say 10-year and some 30-year; no need to muddy the waters like Reganti did with harping on 50–100-year bonds) debt would have been a problem. So much fixed-income debt was vomited forth from the Treasury that even making a minor portion of it short-term would, I believe, have satisfied market needs. The Fed could have concentrated on buying and holding the longer-term bonds, and rolling them over eventually as needed, without disturbing the markets. That would have bought the country a decade or so of respite before the real interest rate effects of the pandemic debt issuance began to bite.

But nobody asked my opinion at the time.

Godzilla Minus One is fantastic

Did you know you could make a Godzilla movie, maybe the best one at that, for $15 million dollars (or 3 minutes of Chris Pratt in “End Game”, if you’d prefer numeraire)? This film, in which Godzilla is basically the demon baby of Jason Voorhees and the shark from Jaws, deftly explores concepts of guilt, shame, redemption, forgiveness, and family. I cried at the end. I repeat, I cried at the end of a Godzilla movie.

In the last month I’ve watched a Godzilla movie that is specifically constructed to recreate the feeling of a 1950s monster movie, a flawed but admirable attempt to make a modern Charlie Chaplin movie (Fool’s Paradise), and a watchable if uneven and wholly debauched variation on “Singin’ in the Rain” (Babylon). I don’t think this is a coincidence. I think this is a response to VFX and super hero (not comic book) movie fatigue. One way to do that is to go backwards, not in subject matter or setting necessarily, but in story composition and construct. The performances in all three films felt more stage than screen. Texture was emphasized over shock and awe. Emotional crescendos felt more earned than manipulated. I’m not saying these three films are perfect or even necessarily good. What I’m saying is that they felt like a return to older form of film as a medium.

For the last 15 years we’ve had a lot of “remakes” that attempted to modernize old films. Don’t be surprised if we see the inverse going forward: new, original stories filmed in a manner that feels older. “The Thing” but it’s a sea alien on an oil platform, everything wet and on fire. “All the Presidents Men” but it’s a coverup in local Iowa government, with scratchy sunken sofas and life-changing smoke breaks. “Working Girl” but it’s Zendaya and Scarlett Johannsen in a fully modern context, where a misread text subverts an expected plot turn on a broken iPhone screen. Not for a love of classic cinema mind you, or even art, but because making 10 to 1 on winners and losing next to nothing on flops is a business proposition more than a few studios are likely to find enticing.

Go see “Godzilla Minus One”.

Do People Trust ChatGPT Writing?

My new working paper with Will Hickman is up on SSRN: Do People Trust Humans More Than ChatGPT?

We study whether people will pay for a fact-check on AI writing. ChatGPT can be very useful, but human readers should not trust every fact that it reports. Yesterday’s post was about ChatGPT writing false things that look real.

The reason participants in our experiment might pay for a fact-check is that they earn bonus payments based on whether they correctly identify errors in a paragraph. If participants believe that the paragraph does not contain any errors, they should not pay for a fact-check. However, if they have doubts, it is rational to pay for a fact-check and earn a smaller bonus, for certain.

Abstract: We explore whether people trust the accuracy of statements produced by large language models (LLMs) versus those written by humans. While LLMs have showcased impressive capabilities in generating text, concerns have been raised regarding the potential for misinformation, bias, or false responses. In this experiment, participants rate the accuracy of statements under different information conditions. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT. However, when informed about authorship, participants show equal skepticism towards both human and AI writers. There is an increase in the rate of costly fact-checking by participants who are explicitly informed. These outcomes suggest that trust in AI-generated content is context-dependent.

Our original hypothesis was that people would be more trusting of human writers. That turned out to be only partially true. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT.

We presented information to participants in different ways. Sometimes we explicitly told them about authorship (informed treatment) and sometimes we asked them to guess about authorship (uninformed treatment).

This graph (figure 5 in our paper) shows that the overall rate of fact-checking increased when subjects were given more explicit information. Something about being told that a paragraph was written by a human might have aroused suspicion in our participants. (The kids today would say it is “sus.”) They became less confident in their own ability to rate accuracy and therefore more willing to pay for a fact-check. This effect is independent of whether participants trust humans more than AI.

We are thinking of fact-checking as often a good thing, in the context of our previous work on ChatGPT hallucinations. So, one policy implication is that certain types of labels can cause readers to think critically. For example, Twitter labels automated accounts so that readers know when content has been chosen or created by a bot.

Our working paper is currently trending on SSRN top ten lists such as this one.

Suggested Citation:
Buchanan, Joy and Hickman, William, Do People Trust Humans More Than ChatGPT? (November 16, 2023). GMU Working Paper in Economics No. 23-38, Available at SSRN: https://ssrn.com/abstract=4635674

GPT-4 Generates Fake Citations

I am happy to share my latest publication at The American Economist: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Citation: Buchanan, J., Hill, S., & Shapoval, O. (2024). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist. 69(1), 80-87  https://doi.org/10.1177/05694345231218454

Blog followers will know that we reported this issue earlier with the free version of ChatGPT using GPT-3.5 (covered in the WSJ). We have updated this new article by running the same prompts through the paid version using GPT-4. Did the problems go away with the more powerful LLM?

The error rate went down slightly, but our two main results held up. It’s important that any fake citations at all are being presented as real. The proportion of nonexistent citations was over 30% with GPT-3.5, and it is over 20% with our trial of GPT-4 several months later. See figure 2 from our paper below for the average accuracy rates. The proportion of real citations is always under 90%. GPT-4, when asked about a very specific narrow topic, hallucinates almost half of the citations (57% are real for level 3, as shown in the graph).

The second result from our study is that the error rate of the LLM increases significantly when the prompt is more specific. If you ask GPT-4 about a niche topic for which there is less training data, then a higher proportion of the citations it produces are false. (This has been replicated in different domains, such as knowledge of geography.)

What does Joy Buchanan really think?: I expect that this problem with the fake citations will be solved quickly. It’s very brazen. When people understand this problem, they are shocked. Just… fake citations? Like… it printed out reference for papers that do not actually exist? Yes, it really did that. We were the only ones who quantified and reported it, but the phenomenon was noticed by millions of researchers around the world who experimented with ChatGPT in 2023. These errors are so easy to catch that I expect ChatGPT will clean up its own mess on this particular issue quickly. However, that does not mean that the more general issue of hallucinations is going away.

Not only can ChatGPT make mistakes, as any human worker can mess up, but it can make a different kind of mistake without meaning to. Hallucinations are not intentional lies (which is not to say that an LLM cannot lie). This paper will serve as bright clear evidence that GPT can hallucinate in ways that detract from the quality of the output or even pose safety concerns in some use cases. This generalizes far beyond academic citations. The error rate might decrease to the point where hallucinations are less of a problem than the errors that humans are prone to make; however, the errors made by LLMs will always be of a different quality than the errors made by a human. A human research assistant would not cite nonexistent citations. LLM doctors are going to make a type of mistake that would not be made by human doctors. We should be on the lookout for those mistakes.

ChatGPT is great for some of the inputs to research, but it is not as helpful for original scientific writing. As prolific writer Noah Smith says, “I still can’t use ChatGPT for writing, even with GPT-4, because the risk of inserting even a small number of fake facts… “

Follow-Up Research: Will Hickman and I have an incentivized experiment on trust that you can read on SSRN: Do People Trust Humans More Than ChatGPT?

@IMurtazashvili has pointed me to a great resource for AI-era literature review work. “AI-Based Literature Review Tools” from Texas A&M

Wrapping Up & Sneak Peaks

I’m wrapping up grading for the semester. So this one is super short. What will I be writing about in the upcoming weeks. Here’s a sneak peak:

  1. I will read the course evaluations and let you know how my Game Theory Course changes fared.
  2. I’ll discuss a little bit of the new DID Stata methods. I’ll keep it short and sweet provide an example.
  3. I want to share some thoughts on objectivity, unreasonable academic charity, and our ability to interpret evidence using multiple models.
  4. Squeezing out more time efficiencies in your home life (Especially for parents)
  5. There are too many A’s in my Principles of Macroeconomics class.

These are what’s on the Horizon. I’ll link back here to stay on track. Have a great weekend!