Papers about Economists Using LLMs

  1. The most recent (published in 2025) is this piece about doing data analytics that would have been too difficult or costly before. Link and title: Deep Learning for Economists

Considering how much of frontier economics revolves around getting new data, this could be important. On the other hand, people have been doing computer-aided data mining for a while. So it’s more of a progression than a revolution, in my expectation.

2. Using LLMs to actually generate original data and/or test hypotheses like experimenters: Large language models as economic agents: what can we learn from homo silicus? and Automated Social Science: Language Models as Scientist and Subjects

3. Generative AI for Economic Research: Use Cases and Implications for Economists

Korinek has a new supplemental update as current as December 2024: LLMs Learn to Collaborate and Reason: December 2024 Update to “Generative AI for Economic Research: Use Cases and Implications for Economists,” Published in the Journal of Economic Literature 61 (4)

4. For being comprehensive and early: How to Learn and Teach Economics with Large Language Models, Including GPT

5. For giving people proof of a phenomenon that many people had noticed and wanted to discuss: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Alert: We will soon have an update for current web-enabled models! It would seem that hallucination rates are going down but the problem is not going away.

6. This was published back in 2023. “ChatGPT ranked in the 91st percentile for Microeconomics and the 99th percentile for Macroeconomics when compared to students who take the TUCE exam at the end of their principles course.” (note the “compared to”): ChatGPT has Aced the Test of Understanding in College Economics: Now What?

References          

Buchanan, J., Hill, S., & Shapoval, O. (2023). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist69(1), 80-87. https://doi.org/10.1177/05694345231218454 (Original work published 2024)

Cowen, Tyler and Tabarrok, Alexander T., How to Learn and Teach Economics with Large Language Models, Including GPT (March 17, 2023). GMU Working Paper in Economics No. 23-18, Available at SSRN: https://ssrn.com/abstract=4391863 or http://dx.doi.org/10.2139/ssrn.4391863

Dell, M. (2025). Deep Learning for Economists. Journal of Economic Literature, 63(1), 5–58. https://doi.org/10.1257/jel.20241733

Geerling, W., Mateer, G. D., Wooten, J., & Damodaran, N. (2023). ChatGPT has Aced the Test of Understanding in College Economics: Now What? The American Economist68(2), 233-245. https://doi.org/10.1177/05694345231169654 (Original work published 2023)

Horton, J. J. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv Preprint arXiv:2301.07543.

Korinek, A. (2023). Generative AI for Economic Research: Use Cases and Implications for Economists. Journal of Economic Literature, 61(4), 1281–1317. https://doi.org/10.1257/jel.20231736

Manning, B. S., Zhu, K., & Horton, J. J. (2024). Automated Social Science: Language Models as Scientist and Subjects (Working Paper No. 32381). National Bureau of Economic Research. https://doi.org/10.3386/w32381

Per Capita Consumption: 1990 Vs 2024

This is an update to a previous post that I did on per-capita real consumption in 1990 vs 2021. As of 2021, we still weren’t sure after the pandemic what was transitory vs structural, and it was unclear whether incomes would keep up with inflation. We now have three more years of data through 2024. News flash: We’re even richer.

I like to use the BEA real quantity indices. Those track what is actually consumed in volumes rather than by deflating total spending by price indices. Divided by population, we can calculate the real quantities of goods and services that people actually consumed per capita.

Even after the pandemic policies have settled down, we are still SO MUCH RICHER – and even richer than we were with all of the pandemic-related stimulus. The worst consumption category since the pandemic has been food and beverage for off-premise consumption, and that is *up* 4.6% since 2020, increasing 31% since 1990. So, while I understand that people can’t enjoy the the low prices of yesteryear, we are still better off in that category than pre-pandemic. In the other categories, everything is awesome.

Since 1990, our consumption of communication services has risen 332%, our houses are 254% better furnished, and we have 118% greater quality-adjusted clothing consumption. All of this is already adjusted for inflation and is per-capita. Since the pandemic, these numbers are still up by 20.4%, 9.8%, and 31.1% respectively. People didn’t like the post-pandemic inflation. I get that. But these improvements in average consumption are mind boggling.

Continue reading

The Average Teaching Load of US Professors

“One of the closest guarded secrets in American higher education is the average teaching loads of faculty.” -Richard Vedder

I saw this quote in a recent piece arguing that US professors should teach more. I thought it sounded extreme, but as I look into it, it is surprisingly difficult to find data on this compared to other things like salaries:

Since 1996, for instance, the University of Delaware has administered the annual National Study of Instructional Costs and Productivity, surveying faculty and teaching assistants about course loads and enrollment. The data, though, are “only available to four-year, non-profit institutions of higher education.”[7] This secrecy, needless to say, is not the norm for surveys collected by publicly supported institutions. Tellingly, this study is being discontinued because the number of participating institutions “has slowly declined to unsustainable levels.”[8] *

There are some decent older studies that are public, like this 2005 survey of top liberal arts colleges showing that almost all have teaching loads between 4 and 6 courses per year. But in terms of recent data that is publicly available, the best I’ve found is the Faculty Survey of Student Engagement. It still isn’t great, since their 2024 survey only covers 54 of the 2000+ bachelor’s degree granting colleges in the US, and their tables show that these 54 aren’t especially representative. They make nice graphics though:

The graphics show exact percentages if you hover over them on the original Tableau site. Doing this shows that the median professor teaches 4 undergraduate courses per year. Knowing the full distribution would require the underlying data they don’t share, but from these graphics we can at least compute a rough average (rounding 4+ graduate courses to 4 and 9+ undergraduate courses to 9).

This shows that the average professor teaches 4.43 undergraduate courses and 0.75 graduate courses, for a total of 5.18 courses per academic year. If I restrict the data to full-time tenured or tenure-track professors, they teach an average of 4.72 undergraduate courses and 0.91 graduate courses, for a total of 5.63 courses per academic year.

Overall these loads are higher than I expected, especially since the survey sample is skewed towards research schools. But its still lower than the standard 3-3 load at my own institution, and low enough that it makes for a great job, especially compared to teaching K-12.

Overall though I don’t know why we need to rely on one-off surveys to get data on teaching loads, it seems like data the US Department of Education should collect from all accredited schools and share publicly.

*The Delaware Cost study is not just discontinuing new surveys, they plan to pull down existing data by December 15th 2025. Only schools that participate in their survey get access, so I can’t get the data, but perhaps some of you can.

Change in Homicide Rates from Pre-Pandemic in Large US Cities

We all know that homicides spiked in the US in 2020 and we all (hopefully) know that homicides have been falling across most of the country dramatically since the end of 2021. But have homicides started to get back to, or even below, pre-pandemic levels? Or is it merely reversing the 2020 increases?

The answer depends on the city and the pre-pandemic baseline! The chart below shows the 10 largest cities (with Fort Worth instead of Jacksonville, because the Real-Time Crime Index doesn’t include the latter) in the US, using a base of either January 2018 (the first month in the RTCI) or December 2019 (just before the pandemic, and murders had fallen nationally between these two dates):

The murder data comes from the Real-Time Crime Index, and it is a 12-month total so we shouldn’t have to worry about seasonality even though the months are different. I use Census annual city population estimates to calculate the rates (and estimate 2025 based on the growth from 2023-24).

As you can see, depending on the base timeframe used, about half of the cities saw declines, a few were roughly flat, and some definitely saw increases. New York, Houston, and Fort Worth are definitely still elevated. Los Angeles, Philly, Phoenix, and San Diego are definitely down. The others are either close to even or mixed depending on your baseline.

Keep in mind these data are only through March 2025. As both Billy Binion at Reason and Jeff Asher have both recently emphasized, if we use the most recent data for many cities, it’s entirely possible that 2025 will end up having some of the lowest homicide rates ever recorded for many US cities. The declines in early 2025 have definitely been big, but mostly they are just a continuation of the post-2021 decline.

Again, for clarification, all of these cities are down from their 2020-21 peaks: using September 2021 as the base (when the national murder rate roughly peaked), these 10 cities are down between 31% and 58%. Big improvements!

Corporate Debt by Industry Sector

A reporter recently told me she thought there is a national trend toward hospitals issuing more bonds. I tried to verify this and found it surprising hard to do with publicly available data. But once I had to spend an hour digging through private Compustat data to find the answer, I figured I should share some results. Here’s the average debt in millions of companies by sector:

Source: My graph made from Compustat North American Fundamentals Annual data collapsed by Standard Industrial Classification code into the Fama-French 10 sectors

This shows that health care is actually the least-indebted sector, and telecommunications the most indebted, followed by utilities and “other” (a broad category that actually covers most firms in the Fama-French 10). But are health care firms really more conservative about debt, or are they just smaller? Let’s scale the debt by showing it as a share of revenue:

My graph made from Compustat North American Fundamentals Annual data collapsed by SIC code into the Fama-French 10 sectors (dltt/revt).

It appears that health care firms are the most indebted relative to revenue since 2023. But which parts of health care are driving this?

Hospitals in 2023 followed by specialty outpatient in 2024. However, seeing how much the numbers bounce around from year to year, I suspect they are driven by small numbers of outlier firms. This could be because Compustat North America data only covers publicly traded firms, but many sectors of health care are dominated by private corporations or non-profits.

I welcome suggestions for datasets on the bond-market side of things that are able to do industry splits including private companies, or suggestions for other breakdowns you’d like to see me do with Compustat.

Manufacturing Jobs of the Past

This post is co-written with John Olis, History major at Ave Maria University.

There is a popular myth that manufacturing jobs of the past provided a leg-up to young people. The myth goes like this. Manufacturing jobs had low barriers to entry so anyone could join. Once there, the job paid well and provided opportunities for fostering skills and a path toward long-term economic success. There is more to the myth, but let’s stop there for the moment. Is the myth true?

One of my students, John Olis, did a case study on Connecticut in 1920-1930 using cross sectional IPUMS data of white working age individuals to evaluate the ‘Manufacturing Myth’. We are not talking causal inference here, but the weight of the evidence is non-zero. The story above has some predictions if not outright theoretical assertions.

  1. Manufacturing jobs paid better than non-manufacturing jobs for people with less human capital.
  2. Manufacturing jobs yielded faster income growth than non-manufacturing jobs.
  3. Implicitly, manufacturing jobs provided faster income growth for people with less human capital.

Using only one state and two decades of data obviously makes the analysis highly specific. Expanding the breadth or the timescale could confirm or falsify the results. But historical Connecticut is a particularly useful population because 1) it had a large manufacturing sector, 2) existed prior to the post WWII boom in manufacturing that resulted from the destruction of European capacity, and 3) had large identifiable populations with different levels of human capital.

Who had less human capital on average? There are two groups who are easy to identify: 1) immigrants and 2) illiterate people. Immigrants at the time often couldn’t speak English with native proficiency or lacked the social norms that eased commercial transactions in their new country (on average, not always). Illiterate people couldn’t read or write. Therefore, having a comparative advantage in manual labor, we’d expect these two groups to be well served by manufacturing employment vs the alternative.

Being cross-sectional, the individuals are not linked over time, so we can’t say what happened to particular people. But we can say how people differed by their time and characteristics. Interaction variables help to drill-down to the relevant comparisons. There are two specifications for explaining income*, one that interacts manufacturing employment with immigrant status and one that interacts the status of illiteracy. The baseline case is a 1920 non-operative native or literate person. Let’s start with the below snapshot of 1920. The term used in the data is ‘operative’ rather than ‘manufacturer’, referring to people who operate machines of one sort or another. So, it’s often the same as manufacturing, but can also be manufacturing-adjacent. The below charts illustrate the effect of lower human capital in pink and the additional subpopulation impacts of manufacturing in blue.

In the left-hand specification, native operatives made 2.2% less than the baseline population. That is, being an operative was slightly harmful to individual earnings. Being an immigrant lowered earnings a substantial 16.8%, but being an operative recovered most of the gap so that immigrant operatives made only 6.1pp less than the baseline population and only 3.9pp less than native operatives. In the right-hand specification, unsurprisingly, being illiterate was terrible for one’s earnings to the tune of 23.4pp. And while being an operative resulted in a 1.2% earnings boost among natives, being an operative entirely eliminated the harm that illiteracy imposed on earnings.

Both graphs show that manufacturing had tiny effects for a typical native or literate individual. But manufacturing mattered hugely for people who had less human capital. So, prediction 1) above is borne out by the data: Manufacturing is great for people with less-than-average human capital.

But what about earnings *growth*? See below.

Continue reading

The Most Regulated States

The Mercatus Center has put together a page of “Snapshots of State Regulation” using data from their State RegData project. Their latest data suggests that population is still a big predictor of state-level regulation, on top of the red/blue dynamics people expect:

They also made pages with much more detail on each state, like what the most regulated industries in each state are and how each one compares to the national average:

You can find your state here.

Spending on Necessities Has Declined Dramatically in the United States

Has it gotten easier or harder for Americans to afford the basic necessities of life? Part of the answer to this question depends on how you define “basic necessities,” but using the common triad of food, clothing, and housing seems like a reasonable definition since these composed over 80% of household spending in 1901 in the United States.

If we use that definition of necessities, here is what the progress has looked like in the US since 1901:

The data comes from various surveys that the Bureau Labor Statistics has collected over the years, collectively known as the Consumer Expenditure Surveys. The surveys were conducted about once every 1-2 decades from 1901 up until the 1980s, and then annually starting in 1984. Some of these are multi-year averages, but to simplify the chart I’ll just state one year (e.g., “1919” is for 1918 and 1919). The categories are fairly comprehensive: “food” includes both groceries and spending at restaurants; “housing” includes either mortgage or rent, plus things like utilities and maintenance; and “clothing” includes not only the cost of the clothes themselves, but services associated with them such as repairs or alterations (much more important in the past).

We can see in the chart that over time the share spent on these three areas of spending has declined dramatically, taken as a group. Housing is different, but it has been fairly stable over time, mostly staying between 22% and 29% of income (the Great Depression being an exception). There are two time periods when these costs rose: the Great Depression and the late 1970s/early 1980s. Both are widely recognized as bad economic times, but they are aberrations. The jump from 1973 to 1985 in spending on necessities was fully offset by 2003, and today spending on necessities is well below 1973 — even though for housing, it is a few percentage points greater.

A chart like this shows great progress over time, but it will inevitably raise many questions. Let me try to answer a few of them in advance.

Continue reading

What is truth? The Bayesian Dawid-Skene Method

I just learned about the Bayesian Dawid-Skene method. This is a summary.

Some things are confidently measurable. Other things are harder to perceive or interpret. An expert researcher might think that they know an answer. But there are two big challenges: 1) The researcher is human and can err & 2) the researcher is finite with limited time and resources. Even artificial intelligence has imperfect perception and reason. What do we do?

A perfectly sensible answer is to ask someone else what they think. They might make a mistake too. But if their answer is formed independently, then we can hopefully get closer to the truth with enough iterations. Of course, nothing is perfectly independent. We all share the same globe, and often the same culture or language. So, we might end up with biased answer. We can try to correct for bias once we have an answer, so accepting the bias in the first place is a good place to start.  

The Bayesian Dawid-Skene (henceforth DS) method helps to aggregate opinions and find the truth of a matter given very weak assumptions ex ante. Here I’ll provide an example of how the method works.

Let’s start with a very simple question, one that requires very little thought and logic. It may require some context and social awareness, but that’s hard to avoid. Say that we have a list of n=100 images. Each image has one of two words written on it, “pass” and “fail”. If typed, then there is little room for ambiguity. Typed language is relatively clear even when the image is substantially corrupted. But these words are written, maybe with a variety of pens, by a variety of hands, and were stored under a variety of conditions. Therefore, we might be a little less trusting of what a computer would spit out by using optical character recognition (OCR). Given our own potential for errors and limited time, we might lean on some other people to help interpret the scripts.

Continue reading