Is A Music Major Worth It?

Our new paper concludes that the answer is a resounding “It Depends”.

It depends on your answer to the following questions:

  1. If you didn’t major in music, would you major in something else, or not finish college?
  2. How dead set are you on a career in music?
Source: Figure 1 of Bailey and Smith (2025)

We found that

  1. Music majors earn more than people who didn’t graduate from college, even if they don’t end up working as musicians
  2. Among musicians, music majors earn more than other majors
  3. But among non-musicians, other majors earn much more than music majors

So on average a music major means higher income if you would be a musician anyway, or if you wouldn’t have gone to college for another major, but lower income than if you majored in something else and worked outside of music. The exact amounts depend on what you control for; this gets complex but this table gives the basic averages before controls:

Source: Table 2 of Bailey and Smith (2025), showing wage plus business income for respondents to the 2018-2022 American Community Survey

For better or worse, a music major also means you are much more likely to be a musician- 113 times more likely, in fact (this is just the correlation, we’re not randomizing people into the major). Despite that incredible correlation, only 9.8% music majors report being professional musicians, and only 22.3% of working musicians were music majors.

Sean Smith had the idea for this paper and wrote the first draft in my Economics Senior Capstone class in 2024. After he graduated I joined the paper as a coauthor to get it ready for journals, and it was accepted at SN Social Sciences last week. We share the data and code for the paper here.

Continue reading

Freedom for Freestanding Birth Centers

Iowa recently joined the growing list of states where midwives or obstetricians can open a freestanding birth center without needing to convince a state board that it is economically necessary. The Des Moines Register provides an excellent summary:

A Des Moines midwife who sued the state for permission to open a new birthing center may have lost a battle in court, but ultimately, she has won the war.

Caitlin Hainley of the Des Moines Midwife Collective sought to open a standalone birthing center in Des Moines, essentially a single-family home repurposed with birthing tubs and other equipment needed to give birth in a comfortable, home-like environment.

To do so, the collective alleged in its 2023 lawsuit, would have required going through a lengthy, expensive regulatory process that would give already established maternity facilities, such as local hospitals, the chance to argue against granting what is known as a certificate of need for the new facility, essentially vetoing competition.

A federal district judge ruled in November that Iowa’s certificate-of-need law is constitutional, finding that legislators had a rational interest in protecting existing hospitals and health care providers.

But while losing the first round in court, the collective’s cause was winning support in a more important venue: the Iowa Capitol. Iowa legislators in their 2025 session passed a bill, which Gov. Kim Reynolds signed on May 1, removing birth centers from the definition of health facilities covered by the certificate-of-need law. The law will formally take effect July 1.

I’m honored to have played a small part in this as the expert witness in the lawsuit.

If you’d like to get involved in making sure birth options are available your state, a great place to start would be to attend the Zoom seminar Roadmap For Reform: Advancing Birth Freedom on July 23rd. It is hosted by the Pacific Legal Foundation, which represented the midwives pro-bono in the Iowa case.

There is strong momentum here with Connecticut, Kentucky, Michigan, Vermont, and West Virginia also recently repealing Certificate of Need requirements for birth centers, but a variety of other barriers remain. States often require freestanding birth centers to obtain a transfer agreement with a nearby hospital before opening to ensure that the hospital will take their emergency cases, even though hospitals are legally required to take all emergency cases. The problem is that hospitals provide both complementary services (emergency care) and substitute services (labor and delivery), and they often choose not to sign transfer agreements in order to prevent competition from a partial substitute. This whole area would benefit both from more academic study, as well as more investigation from antitrust enforcement.

But for today, congratulations to Caitlin Hainley and to Iowa on their victory.

Writing Humanity’s Last Exam

When every frontier AI model can pass your tests, how do you figure out which model is best? You write a harder test.

That was the idea behind Humanity’s Last Exam, an effort by Scale AI and the Center for AI Safety to develop a large database of PhD-level questions that the best AI models still get wrong.

The effort has proven popular- the paper summarizing it has already been cited 91 times since its release on March 31st, and the main AI labs have been testing their new models on the exam. xAI announced today that its new Grok 4 model has the highest score yet on the exam, 44.4%.

Current leaderboard on the Humanity’s Last Exam site, not yet showing Grok 4

The process of creating the dataset is a fascinating example of a distributed academic mega-project, something that is becoming a trend that has also been important in efforts to replicate previous research. The organizers of Humanity’s Last Exam let anyone submit a question for their dataset, offering co-authorship to anyone whose question they accepted, and cash prizes to those who had the best questions accepted. In the end they wound up with just over 1000 coauthors on the paper (including yours truly as one very minor contributor), and gave out $500,000 to contributors of the very best questions (not me), which seemed incredibly generous until Scale AI sold a 49% stake in their company to Meta for $14.8 billion in June.

Source: Figure 4 of the paper

Here’s what I learned in the process of trying to stump the AIs and get questions accepted into this dataset:

  1. The AIs were harder than I expected to stump because they used frontier models rather than the free-tier models I was used to using on my own. If you think AI can’t answer your question, try a newer model
  2. It was common for me to try a question that several models would get wrong, but at least one would still get right. For me this was annoying because questions could only be accepted if every model got them wrong. But of course if you want to get a correct answer, this means trying more models is good, even if they are all in the same tier. If you can’t tell what a correct answer looks like and your question is important, make sure to try several models and see if they give different answers
  3. Top models are now quite good at interpreting regression results, even when you try to give them unusually tricky tables
  4. AI still has weird weaknesses and blind spots; it can outperform PhDs in the relevant field on one question, then do worse than 3rd graders on the next. This exam specifically wanted PhD-level questions, where a typical undergrad not only couldn’t answer the question, but probably couldn’t even understand what was being asked. But it specifically excluded “simple trick questions”, “straightforward calculation/computation questions”, and questions “easily answerable by everyday people”, even if all the AIs got them wrong. My son had the idea to ask them to calculate hyperfactorials; we found some relatively low numbers that stumped all the AI models, but the human judges ruled that our question was too simple to count. On a question I did get accepted, I included an explanation for the human judges of why I thought it wasn’t too simple.

I found this to be a great opportunity to observe the strengths and weaknesses of frontier models, and to get my name on an important paper. While the AI field is being driven primarily by the people with the chops to code frontier models, economists still have lot we can contribute here, as Joy has shown. Any economist looking for the next way to contribute here should check out Anthropic’s new Economic Futures Program.

The Ugly Gray Rhino Gathers Speed

A black swan is a crisis that comes out of nowhere. A gray rhino, by contrast, is a problem we have known about for a long time, but can’t or won’t stop, that will at some point crash into a full-blown crisis.

The US national debt is a classic gray rhino. The problem has slowly been getting worse for 25 years, but the crisis still seems far enough off that almost no one wants to incur real costs today to solve the problem. During the 2007-2009 financial crisis and the 2020-2021 Covid pandemic we had good reasons to run deficits. But we’ve ignored the Keynesian solution of paying back the deficits incurred in bad times with surpluses in good times.

We are currently in reasonably good economic times, but about to pass a mega-spending bill that blows the deficit up from its already-too-high-levels. At a time when we should be running a surplus, we are instead running a deficit around 6% of GDP:

Source: Congressional Budget Office

Our ‘primary deficit’ is lower, a more manageable 3% of GDP. But if interest rates go higher, either for structural reasons or because of a loss of confidence in the US government’s willingness to pay its debts, the total deficit could spiral higher rapidly. The CBO optimistically assumed that the interest rate on 10-year treasuries will fall below 4% in the 2030s, from 4.3% today:

Source: Congressional Budget Office

But their scoring of H.R. 1 (“One Big Beautiful Bill Act”) shows it adding $3 trillion to the debt over the next 10 years, increasing the deficit by ~1% of GDP per year.

I already suspected this gray rhino would eventually cause a crisis, but this bill and the milieu that produced turn it into a near guarantee- nothing stops the deficit train until we hit a full blown crisis. That crisis is no longer just a long-term issue for your kids and grandkids to worry about- you will see it in 7 years or so. Unfortunately, that is still far enough away that current politicians have no incentive to take costly steps to avoid it. In fact, deficits will probably make the economy stronger for a year or two before they start making things worse- convenient for all the Congresspeople up for election in less than 2 years.

Here are the ways I see this playing out, from most to least likely:

  1. By around 2032, either the slowly aging population or a sudden spike in interest rates forces the government to touch at least one of the third rails of American politics: cut Social Security, cut Medicare, or substantially raise taxes on the middle class (explicitly or through inflation).
  2. We get bailed out again by God’s Special Providence for fools, drunks, and the United States of America. AI brings productivity miracles bigger than those of computers and the internet, letting GDP grow faster than our debts.
  3. We default on the national debt (but this is a risky option because we will still want to run big deficits, and lenders will only lend if they expect to get paid back).
  4. We do all the smart policy reforms that economists recommend in time to head off the crisis and stop the rhino. Medical spending falls without important services being cut thanks to supply-side reforms or cheap miracle drugs (GLP-1s going off patent?).

I’m hoping of course for numbers 2 and 4, but after this bill I’m expecting the rhino.

Excluding “Non-Excludable” Goods

Intro microeconomics classes teach that some goods are “non-excludable”, meaning that people who don’t pay for them can’t be stopped from using them. This can lead to a “tragedy of the commons”, where the good gets overused because people don’t personally bear the cost of using it and don’t care about the costs they impose on others. Overgrazing land and overfishing the seas are classic examples.

Source: Microeconomics, by Michael Parkin

Students sometimes get the impression that “excludability” is an inherent property of a good. But in fact, which goods are excludable is a function of laws, customs, and technologies, and these can change over time. Land might be legally non-excludable (and so over-grazed) when it is held in common, but become excludable when the land is privatized or when barbed wire makes enclosing it cheap. Over time, such changes have turned over-grazing into a relatively minor issue.

Overfishing remains a major problem, but this could be starting to change. Legal and technological changes have allowed for enclosed, private aquaculture on some coasts, which provide a large and growing share of all fish eaten by humans. Permitting systems put limits on catches in many countries’ waters, though the high seas remain a true tragedy of the commons for now.

While countries have tried to enforce limits on catches in their national waters, monitoring how many fish every boat is taking has been challenging, so illegal overfishing has remained widespread. But technology is in the process of changing this. For instance, ThayerMahan is developing hydrophone arrays that use sound to track boats:

Technologies like hydrophones and satellites, if used well, will increasingly make public waters more “excludable” and reduce “tragedy of the commons” overfishing.

LIFE Survey Comes Alive

Last year I posted that the Philly Fed had started a new quarterly survey on Labor, Income, Finances, and Expectations (LIFE). I thought it looked promising but had yet to achieve its potential:

It will be interesting to see if this ends up taking a place in the set of Fed surveys that are always driving economic discussions, like the Survey of Consumer Finances and the Survey of Professional Forecasters. If they keep it up and start putting out some graphics to summarize it, I think it will. My quick impression (not yet having spoken to Fed people about it) is that it will be the “quick hit” version of the Survey of Consumer Finances. It asks a smaller set of questions on somewhat similar topics, but is released quickly after each quarter instead of slowly after each year. If they stick with the survey it will get more useful over time, as there is more of a baseline to compare to.

But a year later the survey now has what I hoped for: a solid baseline for comparisons, and pre-made graphics to summarize the results. It continues to show complex and mixed economic performance in the US. People think the economy is getting worse:

They are cutting discretionary (but not necessity) spending at record levels:

They are worried about losing their jobs at record levels:

But key areas like housing, childcare, and transportation are stabilizing:

Overall I think we can synthesize these seemingly contradictory pictures by saying that Americans’ finances are fine now, but they are quite worried that things are about to get worse, perhaps due to the tariffs taking effect. You can find the rest of the LIFE survey results (including all the non-record-setting ones) here.

The End of Easy Student Loans

The Senate Health, Education, Labor and Pensions Committee is proposing to cut off student loans for programs whose graduates earn less than the median high school graduate. The House proposed a risk-sharing model where colleges would partly pay back the federal government when their students fail to pay back loans themselves. Both the House and Senate propose to cap how much students can borrow for graduate loans. Both would reduce federal spending on higher ed by about $30-$35 billion per year, cutting the size of the $700 billion higher ed sector by 4-5%. I expected that something like this would happen eventually, especially after the student loan forgiveness proposals of 2022:

While we aren’t getting real reform now, I do think forgiveness makes it more likely that we’ll see reform in the next few years. What could that look like?

The Department of Education should raise its standards and stop offering loans to programs with high default rates or bad student outcomes. This should include not just fly-by-night colleges, but sketchy masters degree programs at prestigious schools.

Colleges should also share responsibility when they consistently saddle students with debt but don’t actually improve students’ prospects enough to be able to pay it back. Economists have put a lot of thought into how to do this in a manner that doesn’t penalize colleges simply for trying to teach less-prepared students.

I’d bet that some reform along these lines happens in the 2020’s, just like the bank bailouts of 2008 led to the Dodd-Frank reform of 2010 to try to prevent future bailouts. The big question is, will this be a pragmatic bipartisan reform to curb the worst offenders, or a Republican effort to substantially reduce the amount of money flowing to a higher ed sector they increasingly dislike?

Of course, there is a lot riding on the details. How exactly do you calculate the income of graduates of a program compared to high school grads? The Senate proposal explains their approach starting on page 58. They want to compare the median income of working students 4 years after leaving their program (whether they graduated or dropped out, but exempting those in grad school) to the median income of those with only a high school diploma who are age 25-34, working, and not in school.

Nationally I calculate that this would make for a floor of $31,000. That is, the median student who is 4 years out from your program and is working should be earning at least $31k. In practice the bill would implement a different number for each state. This seems like a low bar in general, though you could certainly quibble with it. For instance, those 4 years out from a program may be closer to age 25 than age 34, but income typically rises with age during those years. If you compare them to 26 year old high school grads, the national bar would be just $28k.

What sorts of programs have graduates making less than $31k per year?

Continue reading

The Average Teaching Load of US Professors

“One of the closest guarded secrets in American higher education is the average teaching loads of faculty.” -Richard Vedder

I saw this quote in a recent piece arguing that US professors should teach more. I thought it sounded extreme, but as I look into it, it is surprisingly difficult to find data on this compared to other things like salaries:

Since 1996, for instance, the University of Delaware has administered the annual National Study of Instructional Costs and Productivity, surveying faculty and teaching assistants about course loads and enrollment. The data, though, are “only available to four-year, non-profit institutions of higher education.”[7] This secrecy, needless to say, is not the norm for surveys collected by publicly supported institutions. Tellingly, this study is being discontinued because the number of participating institutions “has slowly declined to unsustainable levels.”[8] *

There are some decent older studies that are public, like this 2005 survey of top liberal arts colleges showing that almost all have teaching loads between 4 and 6 courses per year. But in terms of recent data that is publicly available, the best I’ve found is the Faculty Survey of Student Engagement. It still isn’t great, since their 2024 survey only covers 54 of the 2000+ bachelor’s degree granting colleges in the US, and their tables show that these 54 aren’t especially representative. They make nice graphics though:

The graphics show exact percentages if you hover over them on the original Tableau site. Doing this shows that the median professor teaches 4 undergraduate courses per year. Knowing the full distribution would require the underlying data they don’t share, but from these graphics we can at least compute a rough average (rounding 4+ graduate courses to 4 and 9+ undergraduate courses to 9).

This shows that the average professor teaches 4.43 undergraduate courses and 0.75 graduate courses, for a total of 5.18 courses per academic year. If I restrict the data to full-time tenured or tenure-track professors, they teach an average of 4.72 undergraduate courses and 0.91 graduate courses, for a total of 5.63 courses per academic year.

Overall these loads are higher than I expected, especially since the survey sample is skewed towards research schools. But its still lower than the standard 3-3 load at my own institution, and low enough that it makes for a great job, especially compared to teaching K-12.

Overall though I don’t know why we need to rely on one-off surveys to get data on teaching loads, it seems like data the US Department of Education should collect from all accredited schools and share publicly.

*The Delaware Cost study is not just discontinuing new surveys, they plan to pull down existing data by December 15th 2025. Only schools that participate in their survey get access, so I can’t get the data, but perhaps some of you can.

Queens 2060: Where Upzoning Matters Most

Most US cities make it hard for housing supply to meet demand because of rules that prevent large apartment buildings. Usually cities do this with zoning rules that limit the number of homes per parcel, often to as low as 1. New York City relies more on rules about Floor Area Ratio (the ratio of the floor area to the area of the parcel). But how binding are these rules? If we relaxed or repealed them, how much new construction would we see, and where would we see it?

MIT PhD student Vincent Rollet has calculated this for New York City:

I build a dynamic general equilibrium model of the supply and demand of floorspace in a city , which I estimate using a novel parcel-level panel dataset of land use and zoning in New York City. I validate the model using quasi-experimental variation from recent zoning reforms and use it to simulate the effects of zoning changes on construction and prices.

He finds that eliminating these rules in NYC would lead to a construction boom, with a 79% increase in the amount of floor space available by 2060. This would allow many more people to live in New York, with a 52% increase in population; but many of the benefits would go to existing NYC residents, with more floor space per person and modestly lower rents leading to higher wellbeing:

Where exactly would we see the building boom? Not Manhattan, but Brooklyn and Queens. The intuition is that zoning is most binding in places where housing prices are currently high but where the buildings are currently small; this is where there is the biggest incentive to tear down existing buildings and build taller if you are allowed to.