You Read It Here First

The subjects of two of our posts from 2023 are suddenly big stories.

First, here’s how I summed up New Orleans’ recovery from hurricane Katrina then:

Large institutions (university medical centers, the VA, the airport, museums, major hotels) have been driving this phase of the recovery. The neighborhoods are also recovering, but more slowly, particularly small business. Population is still well below 2005 levels. I generally think inequality has been overrated in national discussions of the last 15 years relative to concerns about poverty and overall prosperity, but even to me New Orleans is a strikingly unequal city; there’s so much wealth alongside so many people seeming to get very little benefit from it. The most persistent problems are the ones that remain from before Katrina: the roads, the schools, and the crime; taken together, the dysfunctional public sector.

The New York Times had a similar take yesterday:

Today, New Orleans is smaller, poorer and more unequal than before the storm. It hasn’t rebuilt a durable middle class, and lacks basic services and a major economic engine outside of its storied tourism industry…. New Orleans now ranks as the most income-unequal major city in America…. In areas that attracted investment — the French Quarter, the Bywater and the shiny biomedical corridor — there are few outward signs of the hurricane’s impact. But travel to places like Pontchartrain Park, Milneburg and New Orleans East that were once home to a vibrant Black middle class, and there are abandoned homes and broken streets — entire communities that never regained their pre-Katrina luster…. Meanwhile, basic city functions remain unreliable.

I wrote in 2023 about a then-new Philadelphia Fed working paper claiming that mortgage fraud is widespread:

The fraud is that investors are buying properties to flip or rent out, but claim they are buying them to live there in order to get cheaper mortgages…. One third of all investors is a lot of fraud!… such widespread fraud is concerning, and I hope lenders (especially the subsidized GSEs) find a way to crack down on it…. This mortgage fraud paper seems like a bombshell to me and I’m surprised it seems to have received no media attention; journalists take note. For everyone else, I suppose you read obscure econ blogs precisely to find out about the things that haven’t yet made the papers.

Well, that paper has now got its fair share of attention from the media and the GSEs. Bill Pulte, director of the Federal Housing Finance Agency and chairman of Fannie Mae and Freddie Mac, has been going after Biden-appointed Federal Reserve Governor Lisa Cook over allegations that she mis-stated her primary residence on a mortgage application:

Pulte has written many dozens of tweets about this, at least one of which cited the Philly Fed paper:

Now President Trump is trying to fire Cook. Federal Reserve Governors can only be fired “for cause” and none ever have been, but Trump is using this alleged mortgage fraud to try to make Cook the first.

The Trump administration seems to have made the same realization as Xi Jinping did back in 2012– that when corruption is sufficiently widespread, some of your political opponents have likely engaged in it and so can be legally targeted in an anti-corruption crackdown (while corruption by your friends is overlooked).

I’m one of a few people hoping for the Fed to be run the most competent technocrats with a minimum of political interference:

But I’m not expecting it.

Remember, you read it here first.

Parental Job Lock

The Affordable Care Act was supposed to make it easier for American workers to switch jobs by making it easier to get health insurance from sources other than their current employer. Mostly it didn’t work out that way. But a new paper finds that one piece of the ACA actually made people less likely to switch jobs.

The ACA Dependent Coverage Mandate required family health insurance plans to cover young adults though age 26, when prior to the 2010 passage of the ACA many had to leave the family plan at age 18 or 19. I thought these newly covered young adults would be more likely to switch jobs or start businesses, but there turned out to be absolutely no effect on job switching, and no overall increase in businesses (though it did seem to increase the number of disabled young adults starting businesses, and other parts of the ACA increased business formation among older adults).

But while the Dependent Coverage mandate seems not to have reduced job lock for young adults, it increased job lock among their parents. That is the finding of a new paper in the Journal of Public Economics by Hannah Bae, Katherine Mackel, and Maggie Shi. Using a large dataset with exact months of age and coverage, MarketScan, allows them to estimate precise effects:

We find that dependents just to the right of the December 1985/January 1986 cutoff—those eligible for longer coverage—are more likely to enroll and remain covered for longer once the mandate is in effect. Dependent enrollment increases by 1.8 percentage points at the cutoff, an increase of 9.2 % over the enrollment rate for dependents born in December 1985. In addition, the enrollment duration increases by 9.7 days (14.6 %). Turning to their parents, we find that parental job retention likelihood increases by 1.0 percentage point (1.8 %) and job duration increases by 5.8 days (1.6 %) to the right of the cutoff. When scaled by the estimated share of dependents on end of year plans, our findings imply that 12 additional months of dependent coverage correspond to a 7.7 % increase in job retention likelihood and a 7.0 % increase in retention duration.

Source: Figure 2 of Bae, Mackel and Shi 2025

I believe in this parental job lock effect partly because of their data and econometric analysis, and partly through introspection. I plan to work for years after I have the money to retire myself in order to keep benefits for my kids, though personally I’m more interested in tuition remission than health insurance.

On top of working longer though, benefits like these enable employers to pay parents lower money wages. A 2022 Labour Economics paper from Seonghoon Kim and Kanghyock Koh found that the Dependent Coverage Mandate “reduced parents’ annual wages by about $2600 without significant reductions in the probability of employment and working hours.” But at least their kids are better off for it.

Is A Music Major Worth It?

Our new paper concludes that the answer is a resounding “It Depends”.

It depends on your answer to the following questions:

  1. If you didn’t major in music, would you major in something else, or not finish college?
  2. How dead set are you on a career in music?
Source: Figure 1 of Bailey and Smith (2025)

We found that

  1. Music majors earn more than people who didn’t graduate from college, even if they don’t end up working as musicians
  2. Among musicians, music majors earn more than other majors
  3. But among non-musicians, other majors earn much more than music majors

So on average a music major means higher income if you would be a musician anyway, or if you wouldn’t have gone to college for another major, but lower income than if you majored in something else and worked outside of music. The exact amounts depend on what you control for; this gets complex but this table gives the basic averages before controls:

Source: Table 2 of Bailey and Smith (2025), showing wage plus business income for respondents to the 2018-2022 American Community Survey

For better or worse, a music major also means you are much more likely to be a musician- 113 times more likely, in fact (this is just the correlation, we’re not randomizing people into the major). Despite that incredible correlation, only 9.8% music majors report being professional musicians, and only 22.3% of working musicians were music majors.

Sean Smith had the idea for this paper and wrote the first draft in my Economics Senior Capstone class in 2024. After he graduated I joined the paper as a coauthor to get it ready for journals, and it was accepted at SN Social Sciences last week. We share the data and code for the paper here.

Continue reading

The Simple Utility Function Vs. Socialism

I’m a big fan of Friedrich Hayek. I first read his work in an academic setting. But many people first encounter him via The Road to Serfdom, his book that outlines the political and social consequences of state economic controls. I always meant to go back and read it, but it usually took a back seat to other works. Now, I’m slowly making my way through.

A lovely snippet includes Hayek explaining the popular sentiment that “it’s only money” or that money-related concerns are base or superficial. Such an attitude is especially common when people recount their childhood or family life during times of financial difficulty. The story often goes “times were hard, but we had each other”. Similarly, a popularly derisive trope is that economists ‘only care about money’ [, rather than the more important things].

Continue reading

Writing Humanity’s Last Exam

When every frontier AI model can pass your tests, how do you figure out which model is best? You write a harder test.

That was the idea behind Humanity’s Last Exam, an effort by Scale AI and the Center for AI Safety to develop a large database of PhD-level questions that the best AI models still get wrong.

The effort has proven popular- the paper summarizing it has already been cited 91 times since its release on March 31st, and the main AI labs have been testing their new models on the exam. xAI announced today that its new Grok 4 model has the highest score yet on the exam, 44.4%.

Current leaderboard on the Humanity’s Last Exam site, not yet showing Grok 4

The process of creating the dataset is a fascinating example of a distributed academic mega-project, something that is becoming a trend that has also been important in efforts to replicate previous research. The organizers of Humanity’s Last Exam let anyone submit a question for their dataset, offering co-authorship to anyone whose question they accepted, and cash prizes to those who had the best questions accepted. In the end they wound up with just over 1000 coauthors on the paper (including yours truly as one very minor contributor), and gave out $500,000 to contributors of the very best questions (not me), which seemed incredibly generous until Scale AI sold a 49% stake in their company to Meta for $14.8 billion in June.

Source: Figure 4 of the paper

Here’s what I learned in the process of trying to stump the AIs and get questions accepted into this dataset:

  1. The AIs were harder than I expected to stump because they used frontier models rather than the free-tier models I was used to using on my own. If you think AI can’t answer your question, try a newer model
  2. It was common for me to try a question that several models would get wrong, but at least one would still get right. For me this was annoying because questions could only be accepted if every model got them wrong. But of course if you want to get a correct answer, this means trying more models is good, even if they are all in the same tier. If you can’t tell what a correct answer looks like and your question is important, make sure to try several models and see if they give different answers
  3. Top models are now quite good at interpreting regression results, even when you try to give them unusually tricky tables
  4. AI still has weird weaknesses and blind spots; it can outperform PhDs in the relevant field on one question, then do worse than 3rd graders on the next. This exam specifically wanted PhD-level questions, where a typical undergrad not only couldn’t answer the question, but probably couldn’t even understand what was being asked. But it specifically excluded “simple trick questions”, “straightforward calculation/computation questions”, and questions “easily answerable by everyday people”, even if all the AIs got them wrong. My son had the idea to ask them to calculate hyperfactorials; we found some relatively low numbers that stumped all the AI models, but the human judges ruled that our question was too simple to count. On a question I did get accepted, I included an explanation for the human judges of why I thought it wasn’t too simple.

I found this to be a great opportunity to observe the strengths and weaknesses of frontier models, and to get my name on an important paper. While the AI field is being driven primarily by the people with the chops to code frontier models, economists still have lot we can contribute here, as Joy has shown. Any economist looking for the next way to contribute here should check out Anthropic’s new Economic Futures Program.

Counting Hallucinations by Web-Enabled LLMs

In 2023, we gathered the data for what became “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics.” Since then, LLM use has increased. A 2025 survey from Elon University estimates that half of Americans now use LLMs. In the Spring of 2025, we used the same prompts, based on the JEL categories, to obtain a comprehensive set of responses from LLMs about topics in economics.

Our new report on the state of citations is available at SSRN: “LLM Hallucination of Citations in Economics Persists with Web-Enabled Models

What did we find? Would you expect the models to have improved since 2023? LLMs have gotten better and are passing ever more of what used to be considered difficult tests. (Remember the Turing Test? Anyone?) ChatGPT can pass the bar exam for new lawyers. And yet, if you ask ChatGPT to write a document in the capacity of a lawyer, it will keep making the mistake of hallucinating fake references. Hence, we keep seeing headlines like, “A Utah lawyer was punished for filing a brief with ‘fake precedent’ made up by artificial intelligence

What we call GPT-4o WS (Web Search) in the figure below was queried in April 2025. This “web-enabled” language model is enhanced with real-time internet access, allowing it to retrieve up-to-date information rather than relying solely on static training data. This means it can answer questions about current events, verify facts, and provide live data—something traditional models, which are limited to their last training cutoff, cannot do. While standard models generate responses based on patterns learned from past data, web-enabled models can supplement that with fresh, sourced content from the web, improving accuracy for time-sensitive or niche topics.

At least one third of the references provided by GPT-4o WS were not real! Performance has not significantly improved to the point where AI can write our papers with properly incorporated attribution of ideas. We also found that the web-enabled model would pull from lower quality sources like Investopedia even when we explicitly stated in the prompt, “include citations from published papers. Provide the citations in a separate list, with author, year in parentheses, and journal for each citation.” Even some of the sources that were not journal articles were cited incorrectly. We provide specific examples in our paper.

In closing, consider this quote from an interview with Jack Clark, co-founder of Anthropic:

The best they had was a 60 percent success rate. If I have my baby, and I give her a robot butler that has a 60 percent accuracy rate at holding things, including the baby, I’m not buying the butler.

Papers about Economists Using LLMs

  1. The most recent (published in 2025) is this piece about doing data analytics that would have been too difficult or costly before. Link and title: Deep Learning for Economists

Considering how much of frontier economics revolves around getting new data, this could be important. On the other hand, people have been doing computer-aided data mining for a while. So it’s more of a progression than a revolution, in my expectation.

2. Using LLMs to actually generate original data and/or test hypotheses like experimenters: Large language models as economic agents: what can we learn from homo silicus? and Automated Social Science: Language Models as Scientist and Subjects

3. Generative AI for Economic Research: Use Cases and Implications for Economists

Korinek has a new supplemental update as current as December 2024: LLMs Learn to Collaborate and Reason: December 2024 Update to “Generative AI for Economic Research: Use Cases and Implications for Economists,” Published in the Journal of Economic Literature 61 (4)

4. For being comprehensive and early: How to Learn and Teach Economics with Large Language Models, Including GPT

5. For giving people proof of a phenomenon that many people had noticed and wanted to discuss: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Alert: We will soon have an update for current web-enabled models! It would seem that hallucination rates are going down but the problem is not going away.

6. This was published back in 2023. “ChatGPT ranked in the 91st percentile for Microeconomics and the 99th percentile for Macroeconomics when compared to students who take the TUCE exam at the end of their principles course.” (note the “compared to”): ChatGPT has Aced the Test of Understanding in College Economics: Now What?

References          

Buchanan, J., Hill, S., & Shapoval, O. (2023). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist69(1), 80-87. https://doi.org/10.1177/05694345231218454 (Original work published 2024)

Cowen, Tyler and Tabarrok, Alexander T., How to Learn and Teach Economics with Large Language Models, Including GPT (March 17, 2023). GMU Working Paper in Economics No. 23-18, Available at SSRN: https://ssrn.com/abstract=4391863 or http://dx.doi.org/10.2139/ssrn.4391863

Dell, M. (2025). Deep Learning for Economists. Journal of Economic Literature, 63(1), 5–58. https://doi.org/10.1257/jel.20241733

Geerling, W., Mateer, G. D., Wooten, J., & Damodaran, N. (2023). ChatGPT has Aced the Test of Understanding in College Economics: Now What? The American Economist68(2), 233-245. https://doi.org/10.1177/05694345231169654 (Original work published 2023)

Horton, J. J. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv Preprint arXiv:2301.07543.

Korinek, A. (2023). Generative AI for Economic Research: Use Cases and Implications for Economists. Journal of Economic Literature, 61(4), 1281–1317. https://doi.org/10.1257/jel.20231736

Manning, B. S., Zhu, K., & Horton, J. J. (2024). Automated Social Science: Language Models as Scientist and Subjects (Working Paper No. 32381). National Bureau of Economic Research. https://doi.org/10.3386/w32381

Queens 2060: Where Upzoning Matters Most

Most US cities make it hard for housing supply to meet demand because of rules that prevent large apartment buildings. Usually cities do this with zoning rules that limit the number of homes per parcel, often to as low as 1. New York City relies more on rules about Floor Area Ratio (the ratio of the floor area to the area of the parcel). But how binding are these rules? If we relaxed or repealed them, how much new construction would we see, and where would we see it?

MIT PhD student Vincent Rollet has calculated this for New York City:

I build a dynamic general equilibrium model of the supply and demand of floorspace in a city , which I estimate using a novel parcel-level panel dataset of land use and zoning in New York City. I validate the model using quasi-experimental variation from recent zoning reforms and use it to simulate the effects of zoning changes on construction and prices.

He finds that eliminating these rules in NYC would lead to a construction boom, with a 79% increase in the amount of floor space available by 2060. This would allow many more people to live in New York, with a 52% increase in population; but many of the benefits would go to existing NYC residents, with more floor space per person and modestly lower rents leading to higher wellbeing:

Where exactly would we see the building boom? Not Manhattan, but Brooklyn and Queens. The intuition is that zoning is most binding in places where housing prices are currently high but where the buildings are currently small; this is where there is the biggest incentive to tear down existing buildings and build taller if you are allowed to.

The Most Regulated States

The Mercatus Center has put together a page of “Snapshots of State Regulation” using data from their State RegData project. Their latest data suggests that population is still a big predictor of state-level regulation, on top of the red/blue dynamics people expect:

They also made pages with much more detail on each state, like what the most regulated industries in each state are and how each one compares to the national average:

You can find your state here.

“How Can the US Manufacture More” Is a Reasonable Question That Deserves Reasonable Answers

Many regular Americans and policymakers say they want the US to manufacture more things domestically. But when they ask economists how to accomplish this, I find that our most common response is to question their premise- to say the US already manufactures plenty, or that there is nothing special about manufacturing. It’s easy for people to round off this answer to ‘your question is dumb and you are dumb’, then go ask someone else who will give them a real answer, even if that real answer is wrong.

Economists tell our students in intro classes that we focus on positive economics, not normative- that we won’t tell you what your goals should be, just how best to accomplish them. But then we seem to forget all that when it comes to manufacturing. Normally we would take even unreasonable questions seriously; but I think wondering how to increase manufacturing output is reasonable given the national defense externalities.

So if you had to increase the value of total US manufacturing output- if you were going to be paid based on a fraction of real US manufacturing output 10 years from now- how would you do it?

I haven’t made a deep study of this, but here are my thoughts. Better ideas at the top, ‘costly but would increase manufacturing output’ ideas at the bottom:

Continue reading