Overfitting Celebrity Pitches

The Washington Post created a fun infographic of celebrity baseball pitches.

I use this graphic in my Data Analytics class. Students are tempted to draw inferences about individuals from this data set. John Wall and Michael Jordan are great athletes, but in this case they are underperforming Avril Lavigne and George W. Bush. Do we conclude that Sonia Sotomayor missed her calling as an MLB player?

The first lesson here is that we should not assume we can predict where Harrison Ford’s next pitch will go based on observing just one pitch. A single pitch should be considered a random draw from a distribution centered around Ford’s average ability. Any single pitch could be an outlier.

Snoop Dog features twice on this graph. In 2012 he got the ball in the strike zone. Had we only seen that, we would want to conclude that he is a great pitcher. However, in 2016 he was way off to the right. In either case, overconfidence that he is predictably near a single pitch would have been a mistake.

Lastly, I use this graph to illustrate the concept of overfitting (investopedia definition). I suggest a model that is obviously inappropriate. What if we conclude from these data that anyone with the last name of Bieber will not be able to throw the ball in the strike zone? That model surely will not generalize. The problem is that if we test that prediction on the same data we used to train the model, the misclassification rate will be zero. If possible, start with a large data set and set aside some portion of the data for validation, before training a model. Having validation data for assessment is a good way to check that you haven’t modeled the noise in your training set.

Publications as Positional Goods, and the Division of Labor in Academia

My co-blogger Mike Makowsky has a thoughtful post this week about the academic publishing process. I wanted to offer a slightly different perspective on the same topic. But my perspective comes from someone who is not at a research university, and someone who has recently survived the tenure process.

A little background for those not completely familiar with the academic world: schools are usually considered either teaching or research schools. At first this seems confusing: both Clemson (where Makowksy is) and the University of Central Arkansas (where I am) require that faculty engage in both research and teaching. The difference is subtle, but the big hint is that Clemson is considered an “R1” school (the highest research designation) and has a PhD program with many graduate students. At a school like Clemson, research is valued more than teaching. At UCA, teaching is valued more than research. (Much more could be said about the differences, perhaps in a future post.)

We both engage in both teaching and research (as well as service!), but the emphasis is different. For me at UCA, the expectations of which journals I will publish in and how frequently I will publish are lower than at a school like Clemson. At Clemson, some of your publications should be in the Top 5 (or at least Top 10) journals from time-to-time. At UCA, if you published in one of the top journals, the assumption would be that you are probably leaving soon to go to an R1 school

I’m glad both types of schools exist, and my point here is not to disparage either type of school. But the difference is important for thinking about the academic publishing process.

For someone at an R1 school, publications in top journals are positional goods. Makowsky doesn’t say this exactly, but that’s my takeaway from his post. There are only so many spots available in these journals, and they have value because there is only a fixed number available. And since there has been, over the years, a lot more economists doing a lot more research not all of the great papers will end up being published in one of the top journals.

Upshot: there are a lot of great papers being published in Top 50 or even Top 100 journals! Let me pick on myself. As I said, I recently successfully survived the tenure process. My publication record was good enough. You can inspect my publications over at Google Scholar. I’m proud of these publications. I think some of them are really great. But I’m fairly confident that I would never earn tenure at Clemson with these publications. Instead, you need a publication record like Makowsky.

What’s interesting here is that Mike and I occasionally publish in some of the same journals. Public Choice and Constitutional Political Economy jump out to me. These are, in my view, very fine journals. Lots of interesting research is published in these journals. I’m especially proud of this paper in Public Choice. But if someone published only in these two journals and journals like them, they wouldn’t get tenure at an R1 university.

So what do we do with this information?

Continue reading

Teaching through my R mistakes

I blogged earlier about a new textbook that I am adopting for an analytics course. The first few chapters are primarily an introduction to using the R coding language within RStudio. One of the resources I’m posting for students this week is screen capture videos of me manipulating data in RStudio.

Sometimes I make mistakes, shockingly. I’m a professional, and yet sometimes I still make careless typos in R. I found out that my version of R was outdated, right when I was in the middle of recording a lecture.

I could have deleted the footage of my mistakes. I could have re-recorded a clean smooth video in which I run command after command without saying “ok… I got an error”.

Continue reading

The Pappy Pricing Puzzle

If you drink bourbon whiskey (or even if you don’t) you’ve probably heard of Pappy Van Winkle. Bourbon has experienced something of a revival in the past two decades, after being in decline for much of the 20th century. As part of this revival, some bourbons have become very highly sought after by the nouveau bourbon enthusiasts. And the various offerings of Pappy Van Winkle are arguably the most highly sought after. Finding Pappy is almost impossible these days, though this was also true a decade ago so it’s not really a “new” phenomena.

So here’s the “puzzle” for economists: why aren’t Pappy and other rare whiskies sold at market prices? No one in the “legal” market seems willing to do so. I put “legal” in quotation marks because there is a robust secondary market for these bottles, and the legal status of these sales is entirely unclear to me as an economist (alcohol markets are, to say the least, highly regulated).

In these secondary markets, it is not unusual for a 20-year bottle of Pappy Van Winkle to sell for $2,000. The “manufacturer’s suggested retail price” is $199.99. But you will never find this bottle on the shelf for that price. The bottles are held by retailers, either to sell to friends, auction off for charity, or conduct a lottery for the right to purchase the bottle at well below market prices.

So why doesn’t the distillery raise the MSRP? Clearly, they do this from time to time. Ten years ago, if you were lucky enough to find this bottle it was around $100 (I was lucky enough, on occasion). Clearly, they recognize that prices can increase. And that’s not just “keeping up with inflation”: $100 in 2011 is about $120 in current dollars. By 2016, they had raised the MSRP to $169.99. But why doesn’t the distillery raise the price more, perhaps all the way up to the market clearing price? By doing so, they would, perhaps, be able to ramp up production so that in 2041 there might be a lot more Pappy on the shelf. At the very least, they could dramatically increase their profit.

Receipt for 1 bottle of 20-year Pappy and 2 bottles of 12-year Van Winkle “Special Reserve” from 2011.

Also, why don’t retailers just put bottles on the shelf at $2,000? Stores occasionally do this, but mostly because they are fed up with all of the customers calling about rare bottles. Sometimes they will price it even higher than secondary markets. But usually, they allocate the bottles by something other than the price mechanism. Why? Businesses don’t usually leave dollar bills, especially $1,000 dollar bills, on the table.

Continue reading

R.I.P. Borders

An analytics textbook is usually full of success stories (i.e. XYZ Corp. invested in a data warehouse and everything got better). I decided that my students needed to hear a downer for balance. What better example than Borders?

Borders was a fixture of suburban New Jersey in the 90’s. You could browse books or media and get coffee there. When I asked undergraduates in 2018 if they remember Borders, I learned how far south Borders had expanded (to Nashville, but not to Birmingham).

Never fear. All of my students knew the Kanye West song “All of the Lights”. The lyrics are:

Continue reading

Teaching Economics with COVID

In many of my blog posts I address either issues related to COVID or teaching economics. In this post, I want to combine the two. One thing economists of a certain age struggle to do is find examples to illustrate economic concepts which will actually connect with 18-22 year olds. The silver lining of the pandemic is that we now have an example that everyone is familiar with, and can be used to illustrate a host of economic concepts.

A great new book by Ryan Bourne, Economics in One Virus, really pushes this idea to the limit. He uses examples related to COVID to explain almost every single concept you would cover in a typical introductory economics course: cost-benefit analysis, thinking on the margin, the role of prices, market incentives, political incentives, externalities, moral hazard, public choice issues, and more.

Continue reading

Thoughts on end-of-semester lectures (Part 1)

At the end of the semester, I like to make a splash with students. For example, in my intermediate microeconomics course I put together a fun lecture. We have some laughs talking about models. We talk Rolling Stones songs like “you can’t always get what you want” (budget constraints) as well as Queen songs like “I want it all” (monotonicity).

We wax philosophical with Robert Frost’s “The Road Not Taken” about opportunity cost. We reiterate that the arguments in utility functions can be a richer set of desires than food and shelter. As Adam Smith says, “Man naturally desires not only to be loved, but to be lovely.” We emphasize that our models are simplified because good models try to get to the heart of the matter.

Sometimes models are dangerous. Like the “monkey illusion” we become so distracted we miss the heart of the matter. One prime example is how Samuelson continued to update the projection about when the USSR would surpass the US economy (check this out for more info) or Easterly’s depiction of the World Bank notion that if you build it, growth will come.

We discuss the importance of models, how they organize our thinking, the dangers of being too wed to a model but also the importance of empirical testing. We use MobLab in class to test our models as I’ve written about here. But, MobLab can’t give us an empirical test of all the important questions. We have to look elsewhere, out in the world to find evidence. One of my favorite examples of this are cross-border comparisons like East and West Germany, North and South Korea, Haiti and the Dominican Republic, etc.

I remind students that incentives matter. Economic institutions influence the costs and benefits of human action. When costs and benefits change, we expect for behavior to change. Throughout the semester we learned to formalize these ideas and they are not without consequence. As this New York Times piece discusses the work of Amartya Sen,

“Nature causes floods and droughts, but most societies have found ways to get food to those afflicted most of the time. Human folly causes famine, which occurs when those ways are blocked. Amartya Sen, a Harvard economist, argued that there has never been a serious famine in a country — even an impoverished one — with a democratic government and a free press. The press acts as a warning system and the pressures of democracy dissuade rulers from famine-producing policies.”

While economics is fun, interesting, and can be light-hearted, economics can also be deadly serious. The stakes of economic illiteracy are enormous.

Next week we go on to Part 2 where I pivot from this section of the end-of-semester talk to the applications of economic ideas to the everyday life of students.

Rationality and economics

Lately I have been thinking a lot about rationality and economics. In my Economics of the Family and Religion course I exhort students to take the approach, “crazy is lazy”. Like archeologists that brush away the dust from artifacts we should brush away the dust of human decision-making and find the rationality. This is especially useful when it comes to understanding observed patterns in religious practice across time and space. You don’t get very far with, “people are nuts”.

Humans make decisions on purpose. They weigh the costs and benefits of an action and make the choice that seems best to them given their available opportunities. Some students have struggled to integrate this message with their other classes. At FSU we have a deep bench of experimental and behavioral economists so there are ample opportunities for students to see courses with a more psychological approach.

In one famous study, Khaneman and Tversky manipulate whether there is a positive or negative frame on a treatment for a deadly disease. In the positive frame, there was a 33 percent chance that treatment could save the 600 people with the deadly disease. In the negative frame, there was a 66 percent chance the treatment could kill the 600 people with the deadly disease. Notice that both of those probabilities result in 200 people being saved. However, people were far more favorable to the positive frame (72%) compared to the negative frame (22%).

Then there are numerous other behavioral economics findings about seemingly small things that impact the decisions people make. For example, in research about bidding behavior psychologists Dan Ariely and Drazen Prelec and economist George Loewenstein passed out sheets of paper and had students write the last two digits of their social security number (SSN) at the top before they placed a bid for each item on a sheet. Students with SSN in the top 20 percent of the distribution bid 216 to 346 percent higher for the items compared to those with SSN in the bottom 20 percent.

We could go on. In the face of those kind of results, it is no wonder that students pause to reflect about how these findings fit into the larger corpus of economics. Are they useful observations to the extent they help us improve the predictions of our models? Are they damning demonstrations that cut out the heart of the economic approach?

Continue reading

An Economist Learns Piano: Part 1

My life didn’t change all that much due to Covid-19 pandemic. I live in a small university town. I mostly continued to go to work and my kids mostly continued to play with their neighbor friends. After a brief hiatus, I ended up growing much closer to my neighbors. One nearby couple are even the godparents my most recently born child.

The university at which I teach is a liberal arts school…. And I teach economics. I knew that these music-type of students and professors were out there, but I didn’t have much exposure. I recently obtained a zero-priced piano and had a good 2-hour conversation with a music major. This post illustrates part of I’ve learned so far. First, a graph.

Whether we want to or not, many of us know the musical scale thanks to The Sound of Music. What I didn’t know was that there is not a uniform distance between all of those notes. Along the x-axis is the note labels (do re mi fa so la ti do). The pitch is characterized by an increment called a step. Given some arbitrary pitch for the first note, do, each subsequent note is a specific number of steps away. The pattern is that each increment between notes is 1 step, except the step from mi to fa and from ti to do. Those are half steps. The result is a segmented function.

Now, this pattern can be applied to a piano.

There are a total of 88 keys on a piano. Some are black, others are white. But all of them are a half-step increment from the prior and subsequent key. IDK why there are small black keys and big white ones. But pianos would be a lot bigger without the narrower black keys. Every single white key on a piano is labeled with a letter. The letter *does not move*. A ‘C’ is always a ‘C’.

What can move is the scale label, do, which can be any key. The pattern identified in the graph above must be maintained. To play ‘in the key of C’ means that ‘C’ is identified as do. The remaining keys can be labeled.

The key of ‘C’ is easy because the entire scale can be played on all white keys.

Those two half steps that we mentioned earlier? Those might have been on a black key – except that there is no black key between ‘B’ and ‘C’ or between ‘E’ and ‘F’. The B-C keys are adjacent. That means that their pitch is a half-step apart – exactly what is necessary for the pitch difference between mi and fa. The same is true for the E-F step and the pitch difference between ti and do.

What about the black keys? We can see their roll by placing do on a different lettered key. We can start on ‘D.  

do to re is a full step, from ‘D’ to ‘E’ – skipping the black half-step that’s between them. For re to mi we need to skip a key, all keys are a half step apart. So? To the black key! We skip ‘F’ and land on the subsequent black key. Then, fa falls on ‘G’, a half step and a single key higher in pitch. ‘A is a full step away from ‘G’, so that’s so. la is another full step away on ‘B‘. Recall that all of the keys are separated by a half-step – the key colors are 100% unimportant. ti is a full step higher – but there is no key separating ‘B’ and ‘C’. So, we skip up to the black key again just as we did with mi. Finally, do is a single key and a half step more.

There you have it! One of the things that a pianists can do is play the entire scale, from do to do, starting from any lettered key on the piano. I can’t do that yet, but golly I certainly feel like I have a better handle of what I’m even looking at.

PS – My conversation took a long time and I had to nail down the difference between 1) The note label, 2) the pitch step increments, & 3) the piano key letter labels. Key letter labels and the note labels are ordinal variables while the steps are cardinal. So, the graph at the top of this post isn’t the only important relationship. The graph below includes the relationship between the step and key letter labels. A graph of the note label and the key letter labels requires a rudimentary knowledge of flats and sharps (with two different do’s).

Gen Z on Deep Work

I asked students to read an excerpt of the first chapter of Cal Newport’s book Deep Work and comment in a discussion board. The prompt asked whether deep work goes on in college and what are the barriers to deep work. I think it’s important for society that some people engage in deep work on our problems. I’m interested in how 20-year-olds perceive Newport’s ideas on focus and what barriers they identify to deep work.

Replies ranged from “I do believe that deep work is happening at college, but I think that it is hard to find students using this strategy regularly.” to “I know multiple people who do not practice deep work….” They each have a different subjective view of “deep work” and their replies are anecdotal. It’s possible that some students are too hard on themselves, considering that I biased them to be negative with the discussion prompt. Some of them might have thought that “deep work” requires many consecutive hours of focus, which is not actually what I expect of undergraduates. Still, the discussion could be helpful to others who aspire to deep work.

The following barriers to deep work were identified:

“The barriers that we experience include social media, roommates, friends, significant others, going to classes, having to work, and any number of other things that cause our day to become disjointed …  We are the first generation that has spent the majority of our life utilizing social media… and in general, are used to taking in information from a large number of sources over a short period of time.

“Most students cannot spend a large amount of hours just focused on the one task at hand and that is required for deep work. For most college students it will be nearly impossible to practice deep work because of a job, outside social life, or a heavy class workload …

“I believe that deep work happens in college a lot.  Students often times must prepare/study for tests for a long time and that is when it happens the most.  When someone has to study for hours they are intensely focused if they put themselves in a good studying environment…

“This can be achieved when you are able to clear your mind of external things and place yourself in a non-distracting environment. As a college student, this can be difficult especially because we are constantly thinking about our to-do list, when will we hang out with friends, or what’s for dinner.

Continue reading