Predicting College Closures: Now with Machine Learning

Small, rural, private schools stand out to me as the most likely to show up on lists of closed colleges. This summer I discussed a 2020 paper by Robert Kelchen that identified additional predictors using traditional regression:

sharp declines in enrollment and total revenue, that were reasonably strong predictors of closure. Poor performances on federal accountability measures, such as the cohort default rate, financial responsibility metric, and being placed on the most stringent level of Heightened Cash Monitoring

Kelchen just released a Philly Fed working paper (joint with Dubravka Ritter and Doug Webber) that uses machine learning and new data sources to identify more predictors of college closures:

The current monitoring solution to predicting the financial distress and closure of institutions — at least at the federal level — is to provide straightforward and intuitive financial performance metrics that are correlated with closure. These federal performance metrics represent helpful but suboptimal measures for purposes of predicting closures for two reasons: data availability and predictive accuracy. We document a high degree of missing data among colleges that eventually close, show that this is a key impediment to identifying institutions at risk of closure, and also show how modern machine learning algorithms can provide a concrete solution to this problem.

The paper also provides a great overview of the state of higher ed. The sector is currently quite large:

The American postsecondary education system today consists of approximately 6,000 colleges and universities that receive federal financial aid under Title IV of the federal Higher Education Act…. American higher education directly produces approximately $700 billion in expenditures, enrolls nearly 25 million students, and has approximately 3 million employees

Falling demand from the demographic cliff is causing prices to fall, in addition to closures:

Between the early 1970s and mid-2010s, listed real tuition and fee rates more than tripled at public and private nonprofit colleges, as strong demand for higher education allowed colleges to continue increasing their prices. But since 2018, tuition increases have consistently been below the rate of inflation

Most college revenue comes from tuition or from state support of public schools; gifts and grants are highly concentrated:

Research funding is distributed across a larger group of institutions, although the vast majority of dollars flows to the 146 institutions that are designated as Research I universities in the Carnegie classifications…. Just 136 colleges or university systems in the United States had endowments of more than $1 billion in fiscal year 2023, but they account for more than 80 percent of all endowment assets in American higher education. Going further, five institutions held 25 percent of all endowment assets, and 25 institutions held half of all assets

Now lets get to closures. As I thought, size matters:

most institutions that close are somewhat smaller than average, with the median closed school enrolling a student body of about 1,389 full-time equivalent students several years prior to closure

As does being private, especially private for-profit (states won’t bail you out when you lose money):

As do trends:

variables measuring ratios of financial metrics and those measuring changes in covariates are generally more important than those measuring the level of those covariates

When they throw hundreds of variables into a machine learning model, it can predict most closures with relatively few false positives, though no one variable stands out much (FRC is Financial Responsibility Composite):

My impression is that the easiest red flag to check for regular people who don’t want to dig into financials is “is total enrollment under 2000 and falling at a private school”.

They predict that the coming Demographic Cliff (the falling number of new 18-year-olds each year) will lead to many more closures, though nothing like the “half of all colleges” you sometimes hear:

The full paper is available ungated here. I’ll close by reiterating my advice from the last post: would-be students, staff, and faculty should do some basic research to protect themselves as they consider enrolling or accepting a job at a college. College employees would also do well to save money and keep their resumes ready; some of these closures are so sudden that employees find out they are out of a job effective immediately and no paycheck is coming next month.

Regulatory Costs and Market Power

That’s the title of a blockbuster new paper by Shikhar Singla. The headline finding is that increased regulatory costs are responsible for over 30% of the increase in market power in the US since the 1990’s. That’s a big deal, but not what I found most interesting.

One big advance is simply the data on regulation. If you want to measure the effect of regulation on different industries, you need to come up with a way to measure how regulated they are. The crude, simple old approach is to count how many pages of regulation apply to a broad industry. The big advance of Mercatus’ RegData was to use machine learning to identify which specific industry is being discussed near “restrictive words” in the Code of Federal Regulation that indicate a regulatory restriction is being imposed. But not all regulatory words (even restrictive ones) are created equal; some impose very costly restrictions, most impose less costly restrictions, and some are even deregulatory. Singla’s solution is to take the government’s estimates of regulatory costs and apply machine learning there:

This paper uses machine learning on regulatory documents to construct a novel dataset on compliance costs to examine the effect of regulations on market power. The dataset is comprehensive and consists of all significant regulations at the 6-digit NAICS level from 1970-2018. We find that regulatory costs have increased by $1 trillion during this period.

The government’s estimates of the costs are of course imperfect, but almost certainly add information over a word-count based approach. Both approaches agree that regulation has increased dramatically over time. How does this affect businesses? Here’s what’s highlighted in the abstract:

We document that an increase in regulatory costs results in lower (higher) sales, employment, markups, and profitability for small (large) firms. Regulation driven increase in con- centration is associated with lower elasticity of entry with respect to Tobin’s Q, lower productivity and investment after the late 1990s. We estimate that increased regulations can explain 31-37% of the rise in market power. Finally, we uncover the political economy of rulemaking. While large firms are opposed to regulations in general, they push for the passage of regulations that have an adverse impact on small firms

More from the paper:

an average small firm faces an average of $9,093 per employee in our sample period compared to $5,246 for a large firm

a 100% increase in regulatory costs leads to a 1.2%, 1.4% and 1.9% increase in the number of establishments, employees and wages, respectively, for large firms, whereas it leads to 1.4%, 1.5% and 1.6% decrease in the number of establishments, employees and wages, respectively for small firms when compared within the state-industry-time groups. Results on employees and wages provide evidence that an increase in regulatory costs creates a competitive advantage for large firms. Large firms get larger and small firms get smaller.

The fact that large firms benefit while small firms are harmed is what drives the increase in concentration and market power.

What I like and dislike most about this paper is the same thing: its a much better version of what Diana Thomas and I tried to do in our 2017 Journal of Regulatory Economics paper. We used RegData restriction counts to measure how regulation affected the number of establishments and employees by industry, and how this differed by firm size. I wish I had thought of using published regulatory cost measures like Singla does, but realistically even if I had the idea I wouldn’t have had the machine learning chops to execute it. The push to quantify what “micro” estimates mean for economy-wide measures is also excellent. I hope and expect to see this published soon in a top-5 economics journal.

HT: Adam Ozimek