Counting Hallucinations by Web-Enabled LLMs

In 2023, we gathered the data for what became “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics.” Since then, LLM use has increased. A 2025 survey from Elon University estimates that half of Americans now use LLMs. In the Spring of 2025, we used the same prompts, based on the JEL categories, to obtain a comprehensive set of responses from LLMs about topics in economics.

Our new report on the state of citations is available at SSRN: “LLM Hallucination of Citations in Economics Persists with Web-Enabled Models

What did we find? Would you expect the models to have improved since 2023? LLMs have gotten better and are passing ever more of what used to be considered difficult tests. (Remember the Turing Test? Anyone?) ChatGPT can pass the bar exam for new lawyers. And yet, if you ask ChatGPT to write a document in the capacity of a lawyer, it will keep making the mistake of hallucinating fake references. Hence, we keep seeing headlines like, “A Utah lawyer was punished for filing a brief with ‘fake precedent’ made up by artificial intelligence

What we call GPT-4o WS (Web Search) in the figure below was queried in April 2025. This “web-enabled” language model is enhanced with real-time internet access, allowing it to retrieve up-to-date information rather than relying solely on static training data. This means it can answer questions about current events, verify facts, and provide live data—something traditional models, which are limited to their last training cutoff, cannot do. While standard models generate responses based on patterns learned from past data, web-enabled models can supplement that with fresh, sourced content from the web, improving accuracy for time-sensitive or niche topics.

At least one third of the references provided by GPT-4o WS were not real! Performance has not significantly improved to the point where AI can write our papers with properly incorporated attribution of ideas. We also found that the web-enabled model would pull from lower quality sources like Investopedia even when we explicitly stated in the prompt, “include citations from published papers. Provide the citations in a separate list, with author, year in parentheses, and journal for each citation.” Even some of the sources that were not journal articles were cited incorrectly. We provide specific examples in our paper.

In closing, consider this quote from an interview with Jack Clark, co-founder of Anthropic:

The best they had was a 60 percent success rate. If I have my baby, and I give her a robot butler that has a 60 percent accuracy rate at holding things, including the baby, I’m not buying the butler.

US State Growth Statistics 2005-2024

In macroeconomics we have basic tools to help us talk about economic growth, which is simply the percent change in RGDP per capita. What causes growth? Lot’s of things. All else constant, if more people are employed, then more will be produced. But the productivity of those workers matters too. That’s why we calculate average labor productivity (ALP), which is the GDP per worker. This tells us how much each worker produces. All else constant, more ALP means more GDP.*

What affects ALP? Nearly everything: Technology, demographics, health, culture, and public policy. Most of these have long-term effects. So, it’s better to think in terms of regimes. After all, incurring debt now can result in a lot of investment and production, but there’s no guarantee that it can be sustained year after year. This is why I don’t get terribly excited about individual good or bad policies at any moment. There’s a lot of ruin in a nation. I care more about the long-run policy regime that is fostered over time.

Given the variety of inputs to economic growth, there’s always plenty of room for complaint about policy – even if the economy is doing well. In this post, I’m inspired by a Youtube video that a student shared with me. The OP laments poor policy in Massachusetts. But compared to some other nearby states, MA is doing just fine economically. This is not the same as saying that the OP is wrong about poor policies. Rather, a regime of policy, technology, interests, etc. is built over time and there can be a lot wrong in growing economies.

In the interest of being comprehensive, this post includes basic growth stats for all states from 2005 through 2024 (the years of FRED-state GDP).** First, let’s start with the basic building blocks of population, employment, and RGDP. Institutions matter. Policy affects whether people migrate to/from the state, fertility, how many people are employed, and what they can produce.

People like to talk about migration and the flocking to Texas & Florida. But that fails to catch the people who choose to stay in their state. Utah is  43% more populous than it was 20 years ago. But you don’t hear much clamoring for their state policies. Idaho and Nevada also beat Florida in terms of percent change. Where are the calls to be like Idaho? Employment largely tracks population, though not perfectly. The RGDP numbers can change quickly with commodity prices, reflected in the performance of North Dakota. But remember, these numbers cover a 20 year span. So, any one blockbuster or dower year won’t move the rankings much.

Of course, these figures just set the stage. What about the employment-population ratio, ALP, and RGDP per capita? Read on.

Continue reading

Excluding “Non-Excludable” Goods

Intro microeconomics classes teach that some goods are “non-excludable”, meaning that people who don’t pay for them can’t be stopped from using them. This can lead to a “tragedy of the commons”, where the good gets overused because people don’t personally bear the cost of using it and don’t care about the costs they impose on others. Overgrazing land and overfishing the seas are classic examples.

Source: Microeconomics, by Michael Parkin

Students sometimes get the impression that “excludability” is an inherent property of a good. But in fact, which goods are excludable is a function of laws, customs, and technologies, and these can change over time. Land might be legally non-excludable (and so over-grazed) when it is held in common, but become excludable when the land is privatized or when barbed wire makes enclosing it cheap. Over time, such changes have turned over-grazing into a relatively minor issue.

Overfishing remains a major problem, but this could be starting to change. Legal and technological changes have allowed for enclosed, private aquaculture on some coasts, which provide a large and growing share of all fish eaten by humans. Permitting systems put limits on catches in many countries’ waters, though the high seas remain a true tragedy of the commons for now.

While countries have tried to enforce limits on catches in their national waters, monitoring how many fish every boat is taking has been challenging, so illegal overfishing has remained widespread. But technology is in the process of changing this. For instance, ThayerMahan is developing hydrophone arrays that use sound to track boats:

Technologies like hydrophones and satellites, if used well, will increasingly make public waters more “excludable” and reduce “tragedy of the commons” overfishing.

Saving Money by Ordering Car Parts from Amazon or eBay

Here is a personal economical anecdote from this week. A medium-sized dead branch fell from a tall tree and ripped off the driver side mirror on my old Honda. My local repair shop said it would cost around $600 to replace it. That is a significant percentage of what the old clunker is worth. Ouch.

They kindly noted that most of that cost would was ordering a replacement mirror assembly from Honda, which would cost over $400 and take several days to arrive.  I asked if I could try to get a mirror from a junkyard, to save money. The repair guy said they would be willing to install a part I brought in, but suggested eBay or Amazon instead.

Back 20 years ago, before online commerce was so established, my local repair shop would routinely save us money by getting used parts from some sort of junkyard network.
So, I started looking into that route. First, junkyards are not junkyards anymore, they are “salvage yards.” Second, it turns out that to remove a side mirror from a Honda is not a simple matter. You have to remove the inside whole plastic door panel to get at the mirror mounting screws, and removing that panel has some complications. Also, I could not find a clear online resource for locating parts at regional salvage yards. It looks like you have to drive to a salvage yard, and perhaps have them search some sort of database to find a comparable vehicle somewhere that might have the part you want.


All this seemed like a lot of hassle, so I went to eBay, and found a promising looking new replacement part there for about $56, including shipping. It would take about a week to get here (probably being direct shipped from China). On Amazon, I found essentially the same part for about $63, that would get here the next day. For the small difference and price, I went the Amazon route, partly for the no hassle returns if the part turned out to be defective and partly because I get 5% back on my Amazon credit card there.
I just got the car back from the repair shop with the replacement mirror, and it works fine. The total cost, with labor was about $230, which is much better than the original $600+ estimate.


I’m not sure how broadly to generalize this experience. Some further observations:

( 1 ) For a really critical car part, I’d have to consider carefully if the Chinese knock-off would perform appreciably worse than some name-brand part – -although, I believe many repair shops often use parts that are not strictly original parts.

( 2 ) Commonly replaced parts like oil and air filters are typically cheaper to buy on-line than from your local Auto Zone or other local merchant. I like supporting local shops, so sometimes I eat the few extra $$ and shopping time, and buy from bricks and mortar.

( 3 ) Some repair shops make significant money on their markup on parts, and so they might not be happy about you bringing in your own parts. They also might decline to warrant the operation of that part. And many big box franchise repair shops may simply refuse to install customer-supplied parts.

( 4 ) For a newish car, still under warranty, the manufacturer warranty might be affected by using non-original parts.

( 5 ) Back to junk/salvage yards: there are some car parts, so-called hard parts, that are expected to last the life of the car. Things like the mounting brackets for engine parts. Typically, no spares of these are manufactured. So, if one of those parts gets dinged up in an accident, your only option may be used parts taken from a junker.

Salty SALT in the OBBB

The Republicans hold a majority in both chambers of congress and they are the party of the president. They want to use that opportunity to pass substantial legislation that addresses their priorities. Hence, the One, Big, Beautiful Bill (OBBB). But, just like the Democratic party, Republican congressmen are a coalition with various and sometimes divergent policy agendas. There are ‘Trump’ Republicans, who want tariffs, executive orders, and deportations. There are more liberal members who want more free markets. You can also find the odd ‘crypto bro’, blue-state representatives, and deficit hawks. Given the slim majority in the House of Representatives, they all have to get something out of the legislation. Put them together, and what have you got?* You get a signature piece of legislation that no one is happy about but everyone touts.

One example of such compromise is the State and Local Tax federal income tax deduction, or SALT deduction. The idea behind it is that income shouldn’t be taxed twice. If you pay a part of your income to your state government in the form of taxes, then the argument goes that you shouldn’t be taxed on that part of your income because you never actually saw it in your bank account. The state took it and effectively lowered your income. The state and local taxes get deducted from the taxable income that you report to the federal government.  The reasoning is that you shouldn’t need to pay taxes on your taxes.

Paying taxes on your taxes sounds bad. And plenty of people don’t like one tax, much less two. The Tax Foundation has done a lot of good work to cut through the chaff and has published many pieces on the SALT deduction over the years.**

Cut and Dry SALT Deduction Facts:

  • It’s a tax cut
  • It reduces federal tax revenue
  • It adds tax code complication
  • It is used by people who itemize rather than take the standard income tax deduction
  • Prior to the 2017 Tax & Jobs Act, there was no limit on the SALT deduction. After, the limit was $10k.
  • The current OBBB increases the SALT deduction.

Those are the basics. Everything else is analysis. The Grover Norquist Republicans never see a tax cut that looks bad, so they’d like to see the SALT limit raised or disappear. Tax think tanks that like simplicity don’t like the SALT deduction because it adds complication. Plenty of others say they don’t like complication, but often change their mind when it comes to the details (much like cutting government waste). Think tanks tend to be a bit lonely on this point.

People mostly care about the SALT deduction due to the distributional effects. Who ends up benefiting from the deduction? The short answer is people who 1) itemize & 2) have heavy state and local tax bills. Who is that? Rich people of course! They have high incomes and lots of wealth and real estate – on which they pay taxes. But not all rich people pay loads of state taxes. So the SALT deduction is a tax cut that primarily benefits rich people who live in high tax districts. Where’s that? See the below.

Continue reading

LIFE Survey Comes Alive

Last year I posted that the Philly Fed had started a new quarterly survey on Labor, Income, Finances, and Expectations (LIFE). I thought it looked promising but had yet to achieve its potential:

It will be interesting to see if this ends up taking a place in the set of Fed surveys that are always driving economic discussions, like the Survey of Consumer Finances and the Survey of Professional Forecasters. If they keep it up and start putting out some graphics to summarize it, I think it will. My quick impression (not yet having spoken to Fed people about it) is that it will be the “quick hit” version of the Survey of Consumer Finances. It asks a smaller set of questions on somewhat similar topics, but is released quickly after each quarter instead of slowly after each year. If they stick with the survey it will get more useful over time, as there is more of a baseline to compare to.

But a year later the survey now has what I hoped for: a solid baseline for comparisons, and pre-made graphics to summarize the results. It continues to show complex and mixed economic performance in the US. People think the economy is getting worse:

They are cutting discretionary (but not necessity) spending at record levels:

They are worried about losing their jobs at record levels:

But key areas like housing, childcare, and transportation are stabilizing:

Overall I think we can synthesize these seemingly contradictory pictures by saying that Americans’ finances are fine now, but they are quite worried that things are about to get worse, perhaps due to the tariffs taking effect. You can find the rest of the LIFE survey results (including all the non-record-setting ones) here.

Household Formation and Generational Wealth

Last week I tried to address whether rising wealth for younger generations was primarily driven by rising home values. My analysis suggested that it was a cause, but not the only cause. Here’s another chart on that topic, showing median net worth excluding home equity for recent generations:

Two things are notable in the chart. For millennials, even excluding home equity they are well ahead of past generations, though of course their net worth is much smaller excluding this category of wealth (the total median net worth for millennials in 2022 was $93,800). But for Gen X in 2022 (last data in that chart), they are slightly behind Boomers, never having recovered from the decline in wealth after 2007 (primarily from the stock market decline, since we’re excluding housing).

But today I want to address another general objection to the wealth data found in the Fed’s SCF and DFA programs. That objection has to do with household formation. Specifically, these surveys are calculated for households, and the age/generation indicators are for the household head (or “householder” as it is now called). And we know that household formation has been declining over time, as more young people live with parents, with roommates, etc. So the Millennial data we see in the chart above is excluding any Millennials that have not yet formed their own household.

Here’s a general picture of the decline, which has been happening gradually since about 1980. Note: I use the age group 26-41, because this is the age of Millennials in 2022 (the most recent SCF survey year). The highlighted years on the chart are when the Silent, Baby Boomer, Gen X, and Millennial generations were about the same age (26-41).

What this means is that when we are looking at households in these wealth surveys (or any survey that focuses on households) we aren’t quite comparing apples to apples. Does this mean the surveys are worthless? No! With the microdata in the SCF, we can look at not only the median value, but the entire distribution. Since the household formation rate has fallen by about 11 percentage points between Boomers in 1989 and Millennials in 2022, one solution is to look up or down the distribution for a rough comparison.

For example, if we assume all of the 11 percent of non-householders among Millennials have wealth below the median, we can make a rough correction by looking at the 39th percentile for Millennials — the 39th percentile would be the median if you included all of those 11 percent of non-householders as households. Similarly, for Gen X would move down 5 percentage points in the distribution to the 45th percentile in 2007.

The household-formation-adjusted chart does paint a more pessimistic picture than just looking at the median for each generation: the 39th percentile Millennial has about 20% less wealth than the median Boomer did at roughly the same age. Seems like generational decline! Is there any silver lining?

First, you should interpret the chart above as a worst case scenario for Millennial wealth. It assumes all non-householders have low wealth. But likely not all of them do. If instead we use the 43rd percentile of Millennials in 2022, their net worth is $61,000, slightly above Boomers at the same age. (The household formation problem isn’t going away anytime soon as generations age — even if we look at Gen Xers, with a median age of 50 in 2022, their household formation is still 6 percentage points behind Boomers at that age.)

Second, my worst case scenario almost certainly overstates the problem. If all of those 11 percent fewer Millennials not yet forming households were to get married to other millennials, it would only add half of that many households to the aggregate distribution (when two non-householders get married, it becomes one household). So instead of moving down 11 percentage points to the 39th percentile, we should only move down 5 or 6 percentiles. The 44th percentile of Millennial net worth in 2022 was $63,060 — again, compare this to Boomers in the chart above.

Finally, if we combine both of the adjustments discussed in this post, looking at wealth excluding home equity and also adjusting for the decline in household formation, we get the following chart (here I once again use the 39th percentile for Millennials and the 45th percentile for Gen X, i.e., the worst case scenario):

With this final adjustment, we get a slightly different picture. The wealth of these three generations is roughly the same at the same age. No increase in wealth, but no decline either. You could read this as pessimistic, if your assumption is that wealth should rise over time, but the general vibes out there are that young people are worse off than in the past. This wealth data suggests, once again, that the kids are doing all right.

Retiring for $100 per Month

Everybody follows a different path. Sometimes that path includes a late start on saving for retirement. Say that you have $0 in your retirement account right now. Is it too late? What can you get as a result of contributing $100 per month? Maybe more than you think.

Let’s start with an annuity equation that tells us our balance at retirement with some assumptions baked in. Let’s assume that we have zero dollars saved and contribute $100 per month. What rate of return do we earn? The S&P earns an average of 10% per year, which may not keep happening. We can conservatively assume 7.5%, but there are other concerns. Taxes and inflation will both eat away at that. Let’s subtract 2.5% for inflation with the Fisher approximation, leaving a real rate of return of 5%. We’ll chop off 20% due to taxes*. Below is the annuity equation that tells us the balance at retirement, depending on how many years from now you retire.

Assuming that you retire at 65 years of age, the graph below describes your balance at retirement depending on the age at which you started saving $100 per month. Of course, it’s not the balance that most people are worried about. Rather, we care about the implied monthly retirement check. The graph describes that on the right axis too, assuming that constant real payments will be made forever as perpetuity payments. We can see that getting started early matters a lot. But starting at age 40 still gets you real monthly retirement payments that are just shy of $200. That’s not too shabby.

Of course, nobody receives all of the perpetuity payments.

Continue reading

The End of Easy Student Loans

The Senate Health, Education, Labor and Pensions Committee is proposing to cut off student loans for programs whose graduates earn less than the median high school graduate. The House proposed a risk-sharing model where colleges would partly pay back the federal government when their students fail to pay back loans themselves. Both the House and Senate propose to cap how much students can borrow for graduate loans. Both would reduce federal spending on higher ed by about $30-$35 billion per year, cutting the size of the $700 billion higher ed sector by 4-5%. I expected that something like this would happen eventually, especially after the student loan forgiveness proposals of 2022:

While we aren’t getting real reform now, I do think forgiveness makes it more likely that we’ll see reform in the next few years. What could that look like?

The Department of Education should raise its standards and stop offering loans to programs with high default rates or bad student outcomes. This should include not just fly-by-night colleges, but sketchy masters degree programs at prestigious schools.

Colleges should also share responsibility when they consistently saddle students with debt but don’t actually improve students’ prospects enough to be able to pay it back. Economists have put a lot of thought into how to do this in a manner that doesn’t penalize colleges simply for trying to teach less-prepared students.

I’d bet that some reform along these lines happens in the 2020’s, just like the bank bailouts of 2008 led to the Dodd-Frank reform of 2010 to try to prevent future bailouts. The big question is, will this be a pragmatic bipartisan reform to curb the worst offenders, or a Republican effort to substantially reduce the amount of money flowing to a higher ed sector they increasingly dislike?

Of course, there is a lot riding on the details. How exactly do you calculate the income of graduates of a program compared to high school grads? The Senate proposal explains their approach starting on page 58. They want to compare the median income of working students 4 years after leaving their program (whether they graduated or dropped out, but exempting those in grad school) to the median income of those with only a high school diploma who are age 25-34, working, and not in school.

Nationally I calculate that this would make for a floor of $31,000. That is, the median student who is 4 years out from your program and is working should be earning at least $31k. In practice the bill would implement a different number for each state. This seems like a low bar in general, though you could certainly quibble with it. For instance, those 4 years out from a program may be closer to age 25 than age 34, but income typically rises with age during those years. If you compare them to 26 year old high school grads, the national bar would be just $28k.

What sorts of programs have graduates making less than $31k per year?

Continue reading

The Growth in Wealth is Not Primarily Driven by Rising Home Prices

As I have discussed in many previous blog posts, young people today have a lot more wealth than past generations at the same point in their life. But we also know that housing prices have increased dramatically in recent years, and that for most families their home is their largest source of wealth.

Does this imply that the increase in wealth young Americans have seen is primarily driven by increased housing prices? If so, this would paint a less optimistic picture of the wealth of young people today, since the value of your home that you usually can’t easily convert into other consumption.

If we look at the past 5 years (2019Q4 to 2024Q4), the total wealth US households under the age of 40 increased by $5 trillion, in nominal terms. That’s not adjusted for inflation, but we don’t need to do so because we can look at how much each asset class increased in nominal terms as well. The total value of assets for households under age 40 increased by $5.86 trillion.

Here’s how the various classes of assets have increased since 2019Q4:

Continue reading