Clemens and Strain on Large and Small Minimum Wage Changes

In my Labor Economics class, I do a lecture on empirical work and the minimum wage, starting with Card & Kreuger (1993). I’m going to quickly tack on the new working paper by Clemens & Strain “The Heterogeneous Effects of Large and Small Minimum Wage Changes: Evidence over the Short and Medium Run Using a Pre-Analysis Plan”.

The results, as summarized in the second half of their abstract are:

relatively large minimum wage increases reduced employment rates among low-skilled individuals by just over 2.5 percentage points. Our estimates of the effects of relatively small minimum wage increases vary across data sets and specifications but are, on average, both economically and statistically indistinguishable from zero. We estimate that medium-run effects exceed short-run effects and that the elasticity of employment with respect to the minimum wage is substantially more negative for large minimum wage increases than for small increases.

The variation in the data comes from choices by states to raise the minimum wage.

A number of states legislated and began to enact minimum wage changes that varied substantially in their magnitude. … The past decade thus provided a suitable opportunity to study the medium-run effects of both moderate minimum wage changes and historically large minimum wage changes.

We divide states into four groups designed to track several plausibly relevant differences in their minimum wage regimes. The first group consists of states that enacted no minimum wage changes between January 2013 and the later years of our sample. The second group consists of states that enacted minimum wage changes due to prior legislation that calls for indexing the minimum wage for inflation. The third and fourth groups consist of states that have enacted minimum wage changes through relatively recent legislation. We divide the latter set of states into two groups based on the size of their minimum wage changes and based on how early in our sample they passed the underlying legislation.

The “large” increase group includes states that enacted considerable change. New York and California “have legislated pathways to a $15 minimum wage, the full increase to which firms are responding exceed 60 log points in total.” Data comes from the American Community Survey (ACS) and the Current Population Survey (CPS).

Continue reading

Human Capital and Filepaths

Someone wrote a story about my life. It’s a report from The Verge called “File Not Found: A generation that grew up with Google is forcing professors to rethink their lesson plans”.

When I started teaching an advanced data analytics class to undergraduates in 2017, I noticed that some of them did not know how to locate files on a PC. Something that is unavoidable in data analytics is getting software to access data from a storage device. It’s not “programming” nor is it “predictive analytics”, but you can’t get far without it. You need to know what directory to point the software to, meaning that you need to know what directory contains the data file.

As the article says

the concept of file folders and directories, essential to previous generations’ understanding of computers, is gibberish to many modern students. It’s the idea that a modern computer doesn’t just save a file in an infinite expanse; it saves it in the “Downloads” folder, the “Desktop” folder, or the “Documents” folder, all of which live within “This PC,” and each of which might have folders nested within them, too. It’s an idea that’s likely intuitive to any computer user who remembers the floppy disk.

I am a long-time PC user. Navigating File Explorer is about as instinctive as drinking a glass of water for me. The so-called digital natives of Gen Z have been glued to mobile device screens that shield them from learning anything about computers.

Not everyone needs to know how computers work. I myself only know the layer that I was forced to learn.

My Dad, to whom I owe so much, kept a Commodore 64 in a closet in our house. About once a year, he would try to entice me into learning how to use it. I remember screwing up my 9-year-old eyes and trying to care. Care, I could not. It’s hard to force yourself to do extra work without a clear goal. The Verge article explains

But it may also be that in an age where every conceivable user interface includes a search function, young people have never needed folders or directories for the tasks they do. The first internet search engines were used around 1990, but features like Windows Search and Spotlight on macOS are both products of the early 2000s. Most of 2017’s college freshmen were born in the very late ‘90s. They were in elementary school when the iPhone debuted; they’re around the same age as Google. While many of today’s professors grew up without search functions on their phones and computers, today’s students increasingly don’t remember a world without them.

One area in which I do minimum archiving is my email. I rely heavily on the search function. I could spend time creating email folders, but I’m not going to put in the time unless I’m forced to.

Here’s where the “problem” lies:

The primary issue is that the code researchers write, run at the command line, needs to be told exactly how to access the files it’s working with — it can’t search for those files on its own. Some programming languages have search functions, but they’re difficult to implement and not commonly used. It’s in the programming lessons where STEM professors, across fields, are encountering problems.

Regardless of source, the consequence is clear. STEM educators are increasingly taking on dual roles: those of instructors not only in their field of expertise but in computer fundamentals as well.

Personally, I don’t mind taking on that dual role. I didn’t learn to program until I really wanted to. The only reason I wanted to was that I had discovered economics. I wanted to be able to participate in social science research. Let these STEM or business courses be the motivation for students to learn to use computers as tools instead of just for entertainment.

Allen Downey wrote a great blog on this topic back in 2018 that is more practical for teachers than the Verge report. He argues that learning to program will be harder for the 20-year-olds of today than it was for “us” (old people as defined by entering college before 2016). He recommends a few practical strategies, while acknowledging that there is “pain” somewhere along the process. He thinks it is sometimes appropriate to delay that pain by using browser-based programming interfaces, in the beginning.

I gave my students a break from pain this week with a little in-browser game that you can play at https://www.brainpop.com/games/blocklymaze/ They got 10 minutes to forget about file paths, and then it was back to the hard work.

I have found that a lot of students need individual attention for this step – the finding a file in their hard drive. I only have to do that once per student. Students pick the system up quickly. File Explorer is a pretty user-friendly mechanism. Everyone just has to have a first time. Sometimes, Zoomers just need a real person who cares about them to come along and say, “The file you downloaded exists on this machine.”

One way around this problem is to reference data that lives on the internet instead of in a local machine. If you are working through the examples in Scott Cunningham’s new book Causal Inference, here’s a piece of the code he provides to import data from his public repository into R.

full_path <- paste(https://raw.github.com/scunning1975/mixtape/master/, df, sep=“”)

df <- read_dta(full_path)

The nice thing about referencing data that is freely available online is that the same line of code will work on every machine as long as the student is connected to the internet.

As more and more of life moves into the cloud, technologists might increasingly be pointing programs to a web address instead of the /Downloads folder on their local machine. Nevertheless, the kids need to have a better sense of where files are stored. He or she who can understand file architecture is going to get paid a lot more than their peers who only know who to poke and scroll on a smartphone.

There is a future scenario in which AI does most of the programming for us. When AI can fetch files for us, then File Explorer may seem obsolete. But I worry about a world in which fewer and fewer humans know where their information is stored.

Penny-Pinchers Gonna Pinch

Text books say that there are two major problems with the Consumer Price Index (CPI). First, accounting for changes in quality is difficult. Second, the CPI is calculated by assuming a fixed basket of goods is consumed over time. For both of these reasons, the rate of inflation that is implied by CPI is typically considered to be about 1% overestimated.

Imperfectly accounting for quality improvements causes higher measured inflation because the stream of services that a product creates for the consumer has increased – even though the product is nominally the same product. For example, the camera on my smart-phone is now good enough to record a high-quality Youtube video, whereas it was of mediocre quality on my previous phone.  My life is better-off with the better camera. But the increase in my quality of life isn’t measured by the CPI. The CPI does, however, make note that I paid a higher price for a phone.

Further, people don’t consume a fixed basket of goods over time. Even if we stopped the introduction of all new products and maintained the quality of all current products, people would still change the composition of their consumption due to price changes among related goods.

When people get hot and bothered by inflation, they often appeal to people who are of less means and who would find higher prices more burdensome. For that reason, below is a graph of some calorically dense and roughly comparable food staple prices (from the PPI).  You can put a protein on top of any one of these and call it a meal: pasta, flour, potatoes, & rice.

Let’s say that a consumer consumed equal parts of these in January of 2020. The CPI assumes that the consumption basket remains constant and plots a weighted average. In such a case, price rose 2.3% through July 2021. But in real life, penny-pinchers gonna pinch. If our consumer is particularly Spartan, then he will always consume the cheapest option – he treats the different foods as perfect substitutes. The Spartan price of consuming *fell* 22.3%. To be clear, the CPI assumes that the consumption composition remains unchanged, while the consumer’s actual basket is responsive to price changes.  Even if a consumer considers these goods to be imperfect substitutes and is willing to cut any particular type of consumption in half in favor of the cheapest alternative, then the price fell by 10%. In fact, a consumer who is at all responsive to prices will always have a cheaper basket than the headline CPI, all else constant.

In conclusion, be careful with your money. Spend it well and seek out alternatives. Your flexibility determines how much money you’ll have at the end of the month. The headline CPI number impacts only the most passive consumer – and even then, budget constraints gonna constrain.

Learning is FUNdamental

Two items came across my radar this week that were absolutely not boring and also got me thinking. Up front, the links are Alexander the Grate on CWT and a guest Slow Boring on Chad.

Something that stood out to me about the two sources above are that the entertainment aspect made more people push through to the end and learn as a result. Right now, after my kids are asleep, I’m splitting my time between reading The Property Species and watching The Good Place on Netflix. The Property Species is really good, but it’s not catnip for my brain like The Good Place.

My son was home for most of the past week, so one of the things I forced him to do was read out loud. He needs to learn to read, and I know reading simple books out loud is good for him. It was clear that he would have chosen a painful burn over learning in this way.

Alexander the Grate is homeless, but I learned that he prefers the term No Fixed Address (NFA). He and Tyler discuss what it is like to live in DC as a homeless person. Policy is mixed in with interesting stories.

Matt Y’s guest on Slow Boring, Jeff Maurer, delivers information on Chad. As he points out, 16 million people live in Chad, so we should educate ourselves about the political situation and how our own policies would affect the fate of the citizens. He, the self-proclaimed Lady Gaga of Chad, is irreverent for a cause.

It’s a Trap!

When I was 22 I applied to the MFA programs in creative writing at the Iowa Writers Workshop and Columbia. They summarily rejected me with a minimum of fuss. They were right to do so, but it is also without question one of the greatest pieces of good fortune to ever befall me.

Let’s talk about “trap” degrees – expensive, often multi-year endeavors that rarely lead to salaries commensurate with the investment and arguably carry negative signal value in the labor market. We could all dunk on the aspiring filmmakers and puppeteers who look as though they were sent from central casting to play exactly the sort of dude who forks over >$100K for the shortest path to becoming the next Spielberg without doing all the messy fundraising, friend-haranguing, lighting improvising, actor recruiting, writing, and film festival peddling that looks an awful lot like high-risk hard work. We could dunk on them, but…but I can’t think of a way to finish that sentence that isn’t arrogant and condescending.

Anyway, we really should put aside the “they did this to themselves” schadenfreude, at least for a second, because regardless of blame, a lot of high opportunity cost human life years are being scammed with the siren song of “look at this great investment in yourself that will feel just like consumption while you are doing it!” There’s nothing new here, mind you. “Eat yourself thin” diets cycle through the zeitgeist with regularity, conveniently next to the book/video/3-week courses that will help you get rich in real estate with no money down. But we should be concerned when an entire sub-industry appears to be selling a human capital investment with negative real value. They may not be the modal or flagship product of higher education, but neither was the Pinto.

There’s similarly no shortage of people eager to point out that a lot of undergraduate education looks like a 4 year cruise, a pretirement if you’ll excuse a shameless attempt at coining unnecessarily cute terminology. We shouldn’t be shocked that purveyors are bundling consumption within an investment where, by design, the check-writers face high monitoring costs — part of the point of college is leaving the nest, right? Think about it from the other side of the equation– higher education is a scammer’s dream. The money folks are out of sight and desperately credulous to believe their child is on the path to status and financial independence. The customer is naïve and unworldly, eager to follow any external entity (other than their parents) that will do their decision-making for them. But the best part is the con’s mark won’t know for sure they’ve been scammed until well after the check is cleared (but not before they’ll receive their first solicitation for alumni donations).

But, you might be saying, graduate and professional schools are meant to be different. This is focused preparation for a narrow field of endeavor. These programs are decidedly not pretirement cruises. This is training. Why would anyone pay for training in something that has no payoff? I’ll offer a couple possibilities:

  1. This isn’t training, it’s consumption, and the buyers are fully aware of it.

I’m sure this accounts for a fair amount of fine arts training, particularly for retirees and hobbyists attending local community colleges, as well on the children of wealthy parents who have no intention of ever pursuing a vocation. More on them in a second.

2. This is training for aspiring men and women of leisure.

Remember gentlemen and ladies of leisure? They used to have their own Census occupation code! This might seem redundant with the previous point, but if your intention is to hob-nob with the rich and more-rich, there is something very much to be said for being able to discuss certain artistic fields at more esoteric levels. There’s also a modern middle-class version of this as well, what in an earlier, more coldly misogynistic, male-dominated time would have been referred to as an “MRS” degree. I imagine there are plenty of men and women who view school as a way of biding their time until a partner emerges who will be the primary earner. Match.com profiles and fix-ups are likely to be more economically fruitful for students mid-pursuit of a graduate degree than those working unimpressive jobs.

We also shouldn’t dismiss those opting for a graceful slide down the economic ladder. Generous families, perhaps a universal basic income, a rich artistic education, and comfortably living in a bohemian southern university town are for many the formula for a quiet, comfortable life unencumbered by the toils of a career. I’ve always enjoyed the company of such folks, at least until they try to tell me how the economy really works. Never follow these people to a second location.

3. This is a scam, and one with potentially far reaching costs.

Like so many scams, you could write a pithy story about well-dressed con-artists who open a “college” in an abandoned strip mall, throw on a coat of paint, and scam the spoiled children of upper-middle class social climbers by offering fake degrees that promise a shortcut to white collar riches and bohemian prestige. It’d be a two-act romp followed by a third where everyone ends up ok and kids learn the value of hard work.

In reality, though, no small number of the victims will be kids from higher education information deserts, who emerge from their undergraduate years with a relatively weak career they were guided towards after they struggled their first semester. Facing grim job prospects, they’re hoping two more years will thin the competition in the rarefied air of the applicants with “graduate education”. It is for these students that I fear the most.

It gives me pause when I see overly narrow masters’ programs that target a specific job rather than training in a set of tools. In service to my own cowardice, I won’t name specific programs, but suggest caution when considering a degree where the only job you’ll be qualified for is in the name of the degree.

I similarly worry about third- and fourth-tier MBA programs (especially if your employer isn’t paying for it). So much of the value of an MBA is the social network it will wire you into. If your parents haven’t heard of the school, it’s probably not much of a network.

Aspiring masters degree students, my advice is this: look up the individual courses you’ll be taking and then explain to the mirror what you’ll learn in each one and the market in which those skills are in demand. If you can’t do that, I advise reconsideration.


That’s all great, but what should we do?

I have no policy solutions, but I do have a piece of pedagogical advice. We need to update the standard operating procedure of guidance counselors in schools everywhere. We’ve been working so hard to convince kids they should go to college, we forgot to teach them how to be discerning customers of higher education. I’m all about caveat emptor as life advice, but if we want to hit people with it as an ex post I-told-you-so, we have to teach it to them ex ante, especially when we’re talking about 17-year-old and (ahem, perhaps mildly infantilized) 21-year-old kids. Just because you’ll walk away with a degree doesn’t mean that degree will be worth the time and tuition.

My guess is that we should up the status of community college, technical certificates, and not going to college at all. At the same time, we should probably lower the status of arts degrees for for artistic fields that are better suited to learning by doing and autodidacts.

Or maybe we just need guidance counselors to bring college seniors on field trips to carnivals across the country. Nothing will teach you the cold truth of scams faster than losing your last 20 bucks pursuing a fluffy bit of googly-eyed asbestos shooting on a bent basketball hoop in front of someone you planned on asking to prom but could never see value in you again after missing 10 shots in a row.

Trust me, that’ll stick with them.

Scale and Online Learning

A simplistic view that I have heard about online learning is that it is of worse quality but cheaper than traditional classroom learning.

We should take the cheaper part seriously. Cheaper can mean new opportunities for many people. Delivering a lecture online can mean that, once the fixed cost of creating the video is incurred, the marginal cost of adding a student is nearly zero. The average cost of delivering instruction goes down with every student who joins the course. Economy of scale is a wonderful thing.

Now, let’s assume a family that has a quiet home and reliable internet service. Assume that a mom, m, signed up for a rock/geology class, r, for her school-aged son who cannot read. It’s me. I signed my son up for an online “rock camp”. I thought it would give me 45 minutes of time to get work done while my son was distracted in a Zoom room.

This week I got an email from the online school company about how to get ready for rock camp. I’m instructed to assemble a supply kit of about 30 items so that my kid can do a hands-on science experiment every day of the camp. This is not what I thought I was signing up for, and I no longer think rock camp is going to save me any time.  It gets me thinking about scale and online education for kids.

All the parents of rock campers will have to separately assemble a kit of supplies. The economies of scale would come from having the children in a physical school. Buy the supplies in bulk and hand out a pack to each kid all at the same time. It would be great to have a *classroom* where the students could *go*. Even though many classes do not involve vinegar and magnets, the point can generalize.

We should take scale seriously. I support experimenting with different kinds of education and giving students choices. Personally, I benefitted from getting to pilot an experimental program at my high school that allowed me to take microeconomics for college credit online. I also participate in online education sometimes as an educator.

However, it’s overly simplistic to say that the scale idea always points us in the direction of online education. Even at the university level, some products/services can be cheaper to deliver in a traditional class setting.

Steve Horwitz on “The Graduate Student Disease”

On Sunday the world lost a great teacher, economist, and all-around fantastic person in Steve Horwitz. If you don’t know about Steve, I recommend reading the tributes from Pete Boettke and Art Carden.

Pete and Art speak to Steve’s overall legacy and greatness. But I will tell you about a very specific piece of advice that Steve gave me about teaching undergrads.

Steve called it “the graduate student disease.” By this he meant the tendency of newly minted PhD economists to teach undergraduate courses as if they were mini versions of graduate courses. Steve insisted this was the wrong approach.

Continue reading

Thread on Programming Ability

Ethan Mollick brought this Nature article to my attention. One of the authors Chantel Prat, is also on the thread.

The sample size for this study is only 36, so we should think of it as preliminary work toward understanding how people learn to program.

Their abstract, with emphasis added by me:

This experiment employed an individual differences approach to test the hypothesis that learning modern programming languages resembles second “natural” language learning in adulthood. Behavioral and neural (resting-state EEG) indices of language aptitude were used along with numeracy and fluid cognitive measures (e.g., fluid reasoning, working memory, inhibitory control) as predictors. Rate of learning, programming accuracy, and post-test declarative knowledge were used as outcome measures in 36 individuals who participated in ten 45-minute Python training sessions. The resulting models explained 50–72% of the variance in learning outcomes, with language aptitude measures explaining significant variance in each outcome even when the other factors competed for variance. Across outcome variables, fluid reasoning and working-memory capacity explained 34% of the variance, followed by language aptitude (17%), resting-state EEG power in beta and low-gamma bands (10%), and numeracy (2%). These results provide a novel framework for understanding programming aptitude, suggesting that the importance of numeracy may be overestimated in modern programming education environments.

Learning Python, at least at first, is more like learning a foreign natural language than it is like doing arithmetic problems.

There are still many open questions in this area, so I see this paper as an important small step in the right direction. I have also done a study on this topic.

Should student debt be dischargeable in bankruptcy?

I’m not an economist who studies education or bankruptcy, and I’m not 100% confident I spelled dischargeable correctly. I am, however, above average at highlighting the difficulty of a question when dissuading a grad student from attempting an impossible thesis question, so let’s dig into this one, which sounds pretty hard to me.

First of all, it is very difficult to discharge student debt during Chapter 7 or 13 bankruptcy, but I think you still can do it if you convince a judge that continued attempts at repayment would create undue hardship i.e. put you in a state of poverty in the wake of previous good faith efforts.

That said, maybe you shouldn’t have to face literal starvation to discharge student loans. That’s a reasonable idea, but what would the broader consequences be? This is tricky question to untangle because there are both welfare consequences and knock-on effects where we are put down different forking paths of politics and policy.

If debt is dischargable, then lenders will expect lower rates of repayment. This increase in lender risk and decrease in return on capital would likely have immediate consequences in the form of:

  1. Higher interest rates
  2. Lower rates of loan approval
  3. Greater dependence on loan collateral
  4. Greater lender interest in what the loaned funds will be applied towards.

Before we tackle those, we also have to consider the different policy environment paths lenders may have to anticipate:

  1. The government stops subsidizing loans. This would lower tuition, but also lower access for low income students.
  2. A loan forgiveness program. Great for people with outstanding debt, but changes how expectations are formed forever going forward.
  3. The government launches a massive “free college” program that covers tuition at state colleges and universities. This would have all kinds of consequences potentially.

But where this really leaves us is with a billion dollar question: will dischargeable students loans lead to lower costs of higher education? I am confident that the answer is a definitive, unassailable maybe.

Higher interest rates is a pretty straightforward prediction, but the consequences are less clear. Higher interest rates could lead to less college matriculation, greater barriers for lower income individuals, and higher expected rates of bankruptcy, in part because decisions are being made by young people who don’t know the future, their future, or, really, anything. Related to this, lenders will become more discerning regarding who they lend to, giving more money on more favorable terms to matriculants from wealthier backgrounds, in no small part because wealthy parents are filled to the brim with collateral, making for excellent co-signers and providers of high school graduation gifts nicer than any car I ever hope to drive.

That is all boring and moderately obvious. It’s 4) that I’m most curious about. If you get into medical school, there is no shortage of institutions eager to dump several hundred thousand dollars in the foyer of your home. Part of the reason for this is the expected future income of physicians and their high graduation rates from medical school thanks to rigorous admission screening. But what is underappreciated is the 100% rate at which medical school students study medicine.

Not so with undergraduate education. You might study electrical engineering with a minor in computer science. You also might study something a senior tells you is the easiest major at your school. You might major in something that sounds fun or interesting. You might study Miscelleneous Studies, where Miscelleneous is a subject that is likely interesting and possibly extremely important, but within which you can choose classes that facilitate your avoiding learning anything useful or applicable in the labor market.

Herein lies the problem. Lenders treat loans for consumption very differently than loans for investment. Nursing and statistics degrees are investments. Art History classes (for most people) are consumption. What’s going to happen to higher education when the lender tells you you can have $200K at 3% to study any STEM field or $75K at 6% to study anything in the humanities? Will the demand for humanities degrees drop? Will the supply of humanities education recede? Are humanities and STEM education complements or substitutes?1

Let me phrase it a different way? Are wealthy fine arts majors cross-subsidizing STEM majors pursuing the first college degrees in their family? Or are they driving up the price of tuition because heavily subsidized credit is facilitating pre-career retirement lifestyles for 4 years?

All of this leaves me with the suspicion that dischargeable student loans will lower tuition for some while raising it for others. This heterogeneity would likely shift the electoral popularity of free tuition programs while also shifting the nature of those program. Maybe “free college” turns into a means-tested program. Maybe “free college” becomes “free STEM college”. Maybe both.

We could speculate what this means for loan forgiveness or subsidies, but this post is too long already and, as should be already clear, we’re not going to solve anything today. My elegant and succinct point is this:

When you massively subsidize a [knowledge, signal] bundled good for so long that it transforms into a [knowledge, signal, 4-year luxury cruise with your peers] bundle, and to accommodate that subsidy you protect your poorly constructed macro-investment in human capital by exempting it from bankruptcy proceedings, and as a result of this weird landscape a bizarre higher education industry emerges that is both one of the greatest achievements in US history but also a trap that 19-year-olds fall into because, really, is there any trap we don’t fall into when we’re 19, and from which thousands of people never financially recover, but if you just fix one part of it no one knows what will happen, and if you try to fix all of it at once in the back of your mind you’re afraid it could turn into the US healthcare industry part deux, well then what you have is a real and important problem that I don’t know how we will solve but I remain confident that other people will be very confident that they know how to solve it and they will get extremely cross with me for not sharing their confidence.2

So maybe don’t try to solve that in your dissertation.3 Might be safer to just definitively estimate the natural rate of interest that underlines all monetary transactions. That’ll be easier.

1The answer is “Yes”.

2 This is, to be extremely clear, not me picking on Ms. Reisenwitz’s tweet which was good and interesting and left me thinking about student loans for two days when I should have been working on the research topics I have actual expertise in.

3 Of course, if you do find a natural experiment where huge chunks of student debt were accidentally made dischargeable in a state for 2 years because of a legislative SNAFU, you should write that dissertation and put me in the acknowledgements.

Overfitting Celebrity Pitches

The Washington Post created a fun infographic of celebrity baseball pitches.

I use this graphic in my Data Analytics class. Students are tempted to draw inferences about individuals from this data set. John Wall and Michael Jordan are great athletes, but in this case they are underperforming Avril Lavigne and George W. Bush. Do we conclude that Sonia Sotomayor missed her calling as an MLB player?

The first lesson here is that we should not assume we can predict where Harrison Ford’s next pitch will go based on observing just one pitch. A single pitch should be considered a random draw from a distribution centered around Ford’s average ability. Any single pitch could be an outlier.

Snoop Dog features twice on this graph. In 2012 he got the ball in the strike zone. Had we only seen that, we would want to conclude that he is a great pitcher. However, in 2016 he was way off to the right. In either case, overconfidence that he is predictably near a single pitch would have been a mistake.

Lastly, I use this graph to illustrate the concept of overfitting (investopedia definition). I suggest a model that is obviously inappropriate. What if we conclude from these data that anyone with the last name of Bieber will not be able to throw the ball in the strike zone? That model surely will not generalize. The problem is that if we test that prediction on the same data we used to train the model, the misclassification rate will be zero. If possible, start with a large data set and set aside some portion of the data for validation, before training a model. Having validation data for assessment is a good way to check that you haven’t modeled the noise in your training set.