Overfitting Celebrity Pitches

The Washington Post created a fun infographic of celebrity baseball pitches.

I use this graphic in my Data Analytics class. Students are tempted to draw inferences about individuals from this data set. John Wall and Michael Jordan are great athletes, but in this case they are underperforming Avril Lavigne and George W. Bush. Do we conclude that Sonia Sotomayor missed her calling as an MLB player?

The first lesson here is that we should not assume we can predict where Harrison Ford’s next pitch will go based on observing just one pitch. A single pitch should be considered a random draw from a distribution centered around Ford’s average ability. Any single pitch could be an outlier.

Snoop Dog features twice on this graph. In 2012 he got the ball in the strike zone. Had we only seen that, we would want to conclude that he is a great pitcher. However, in 2016 he was way off to the right. In either case, overconfidence that he is predictably near a single pitch would have been a mistake.

Lastly, I use this graph to illustrate the concept of overfitting (investopedia definition). I suggest a model that is obviously inappropriate. What if we conclude from these data that anyone with the last name of Bieber will not be able to throw the ball in the strike zone? That model surely will not generalize. The problem is that if we test that prediction on the same data we used to train the model, the misclassification rate will be zero. If possible, start with a large data set and set aside some portion of the data for validation, before training a model. Having validation data for assessment is a good way to check that you haven’t modeled the noise in your training set.

Does Cohabitation Predict Divorce?

My article, coauthored with Sarah Kerrigan and published last week, tries to answer the question. In short, the answer seems to be yes- cohabitation before marriage is associated with a 4.6 percentage point increase in the rate of marital dissolution. This is in line with much of the previous literature, which notes one big exception- choosing right (or getting lucky) the first time: “cohabitation had a significant negative association with marital stability, except when the cohabitation was with the eventual marriage partner”.

But we found some even more interesting facts while digging through the National Survey of Family Growth.

Continue reading

Laboratories of Democracy in Pandemic

You’ve probably heard the phrase that US states are often “laboratories of democracy.” The phrase comes from a Supreme Court case. It’s well known enough that it has a short Wikipedia page. The basic idea is simple: states can try out different policies. If it works, other states can copy it. If it doesn’t work, it only hurts that state.

The 2020-21 pandemic has provided a number of possibilities for the “states as laboratories” concept. Here’s three big ones I can think of (please add more in the comments!):

  1. Do states that impose stricter pandemic policies (“lockdowns”) have better or worse outcomes? This could be about health, the economy, both, or some other outcome.
  2. Do states that end unemployment benefits sooner have quicker labor market recoveries? Or are these not the main drag on the labor market?
  3. Do states that offer incentives for vaccination have higher vaccination rates? And what sort of incentives work best?

These are all good questions, but let me throw some cold water on this whole concept: we might not be able to learn anything from these “experiments”! The primary reason: the treatments aren’t randomly assigned. States choose to implement them.

Let’s think through the potential problems with each of these three areas:

Continue reading

Teaching through my R mistakes

I blogged earlier about a new textbook that I am adopting for an analytics course. The first few chapters are primarily an introduction to using the R coding language within RStudio. One of the resources I’m posting for students this week is screen capture videos of me manipulating data in RStudio.

Sometimes I make mistakes, shockingly. I’m a professional, and yet sometimes I still make careless typos in R. I found out that my version of R was outdated, right when I was in the middle of recording a lecture.

I could have deleted the footage of my mistakes. I could have re-recorded a clean smooth video in which I run command after command without saying “ok… I got an error”.

Continue reading

Population Predicts Regulation

Texas is one of the most regulated states in the country.

This is one of the surprises that emerged from the State RegData project, which quantifies the number of regulatory restrictions in force in each state. It turns out that a state’s population size, rather than political ideology or any thing else, is the best predictor of its regulations.

This is what I found, with my coauthors James Broughel and Patrick McLaughlin, when we set out to test whether a previous paper (Mulligan and Shliefer 2005) that showed a regulation-population link held up when we used the better data that is now available. We found that across states, a doubling of population size is associated with a 22 to 33 percent increase in regulation.

Continue reading

Flying the Friendly Skies (Today and in the Past)

It’s almost summer. About half the US population has at least one dose of a COVID vaccine. For many Americans that haven’t had their employment impacted by the pandemic, their bank accounts are flush with cash and they are ready to do one thing with that cash: travel. See family and friends. See something other than the inside of your own home.

And for many Americans traveling this summer, they will fly. The airlines, no doubt, will appreciate your business. At this time last year, the world had so radically shifted that Zoom’s market cap was bigger than the 7 largest airlines in the world. In May 2020, air passenger traffic in the US was less than 10% of traffic in 2019. Today, we’ve recovered a lot, but we are still only back to about two-thirds of normal levels. And since airplanes are just a marginal cost with wings, flying all their planes at close to full capacity is crucial for airlines to return to profitability. They really need you to fly the friendly skies this summer.

One of the reasons that so many Americans are able to fly in today is because flying is, compared to historical prices, very cheap.

How cheap is flying to today compared to the past? Let’s look at some historical price data for flights.

Continue reading

When will housing prices fall?

US housing prices shot up during the pandemic. People spending all day at home wanted bigger houses, and the Fed fueled their demand with low interest rates. But home owners didn’t want to sell- the total number of homes on the market is less than half what it was a year ago. This combination of rising demand & falling supply has sent prices way up & cut the time homes spend on the market.

Contrary to popular belief, its actually rare for economists to make market forecasts and most of us aren’t especially well-equipped to do so- but I’m going to try anyway! I think home prices will almost certainly stop growing so quickly, and may actually fall, within two years.

Why? The end of the pandemic, the rise of new construction, and the end of low interest rates.

Continue reading

Let’s Talk About Inflation

You’ve probably seen the headlines. Corn prices are double what they were a year ago. Lumber prices are triple. You can find all kinds of other scary examples. Is runaway inflation just around the corner? Is it already here?

And yet, measures of prices that consumers pay are much more stable. The most widely tracked measure, the CPI-U, is up 4.2% over the past year. That’s through April — and keep in mind that it’s starting from a low base since March-May 2020 saw falling prices). The Personal Consumption Expenditures index, often preferred by economists, is up just 2.3% (though that’s only through March).

So what gives? Do these consumer measures understate inflation in some way? Or is the increase in commodity prices telling us that consumer prices will increase soon?

Let’s take that second question first. Do higher commodity prices necessarily lead to higher consumer prices? The answer is a clear no. First, we can see that in the data. The producer price index for all commodities (such as corn and lumber) is up 12% over the year (through March, with April data coming out tomorrow). That’s a big increase. But as the chart below suggests, that probably will not lead to 12% increases in consumer prices. It probably won’t even lead to a 5% increase in consumer prices.

Notice two things about this chart. First, commodity prices (the red line) are much more volatile than consumer prices, both on the upside and downside. Second, there really isn’t much of a lag, if any. The direction of change is similar in both indexes, almost to the month. When producer commodity prices go up, consumer prices also go up, that very same month, but not by the same amount. So all of that 12% increase in producer prices is probably already reflected in consumer prices.

Why might this be? Simple supply and demand analysis (hello Econ 101 critics!) can tell us why.

Continue reading

Are Poor Americans Really as Rich as Average Canadians?

Have you seen this chart? I certainly have. It floats around on social media a lot. The chart seems to indicate that poor Americans are better off than the average person in most other rich countries. Roughly equal to Canada and France, and better off than Denmark or New Zealand.

When I’ve asked for sources in the past, people usually aren’t sure. They remember downloading it from somewhere, but they can’t recall where.

But I think I found the source: it’s this article from JustFacts. After seeing how they calculated it, I’m skeptical that it provides a good comparison of poor Americans to other countries.

Here’s what the chart does. For most countries, it uses a World Bank measure of consumption per capita. They then convert that to US dollars using PPP adjustments. For the poor in the US, they use a consumption estimate for the bottom 20% of households (Table 6), and then divide by the average number of people per household. For the poor in the US, the average consumption for 2010 was an amazing $57,049, more than double the poverty line! That’s about $21,000 per poor person.

How is this possible?

Continue reading

Old Lives Matter

Bryan Caplan has kindly responded to my latest blog post, which was in turn a response to his blog post on the relative value of human lives by age. Caplan has always been kind in his responses, even when responding to pesky graduate students — kind in both his approach and the time he dedicates to responding thoughtfully. So I appreciate his taking the time to respond to me, and I will offer a few more thoughts on the matter.

To briefly summarize: Caplan believes that young lives (10 year olds) are worth 100-1,000 as much as old lives (80 year olds). I contend that they are closer to roughly equally valued. My disagreement with Caplan can be broken down into two categories:

  • A. Caplan’s three reasons why young lives are worth more (a lot more!) than old lives. I didn’t respond to that directly, but I will do so here. I think Caplan is narrowing the goalposts.
  • B. A disagreement over the shape of the VSL curve over the lifetime, specifically whether an inverted-U-shaped curve makes sense. I’ll say more about this too, but Caplan doesn’t just have a beef with me, but with almost everyone in the VSL literature!

Let’s start with Caplan’s three reasons, which he calls “iron-clad”: young people have more years to live, those years are generally healthier, and young people will be missed more when they are gone. The first in undeniably true on average, the second is probably true almost all the time, and I’m not sure on the third, but I’m willing to admit it’s not a slam dunk either way.

So how can I disagree? These are only three things. There are many other considerations, and we can imagine other reasons that old lives are valued as much or more than younger lives! I’ll call mine 4-6 to go with Caplan’s 1-3:

  1. Old age spending is the largest component of public budgets in developed countries (and this is unlikely mostly due to rent seeking or the self interest of younger generations).
  2. The elderly possess wisdom which is highly valuable and that the young benefit from.
  3. The last years of your life are, on average, worth a lot more — you are usually very wealthy, have no employment obligations, you have grandchildren you love (without the responsibilities of parenting), and are (until the very end) generally healthy too.

Taken as a whole, I think these three reasons present a strong counterargument to Caplan’s three reasons. And I think we could certainly come up with more! My point being that Caplan has picked three areas where clearly young lives have the advantage, but ignored all the good reasons why old lives are more valuable. These is what I mean by we shouldn’t rely on our intuitions. Neither of our lists are exhaustive, but let me elaborate on a few of these.

Continue reading