Teaching with ACS regional data

If you are teaching a quantitative college course, then you have probably thought about where to get data that students can practice with.

Public Use Microdata Areas (PUMAs) are non-overlapping, statistical geographic areas that partition each state or equivalent entity into geographic areas containing no fewer than 100,000 people each. The image here shows PUMAs around Birmingham, AL. I created a dataset for my students that includes demographic data from the American Community Survey (ACS) for the region around our university.

For just about any topic you would teach in stats, I can create a mini assignment using data on the people around us. Any American metro area has clusters of high-income households and clusters of low-income households. One example of a an exercise is to create summary statistics on income by PUMA. Students will be surprised to learn the facts about their own city.

Zachary has blogged about how great IPUMS is. The way I obtained the data was to make a free account with IPUMS. If you asked for data on every American, you’ll end up with an unwieldy big file. The trick is to filter out all but a handful of PUMAs. I also recommend restricting it to just one year unless you are teaching time series techniques.

I originally got the idea from Matt Holian. Matt wrote fantastic book called Data and the American Dream. The book has data and R codes that allow you to reproduce the findings from several interesting econ papers that all use ACS data. I’m not teaching material that overlaps perfectly with Matt’s book, so I couldn’t assign it to my students, but I did borrow some elements of his idea and even (with his permission) some of his code.

Book Review: Big Data Demystified

Last year, our economics department launched a data analytics minor program. The first class is a simple 2 credit course called Foundations of Data Analytics. Originally, the idea was that liberal arts majors would take it and that this class would be a soft, non-technical intro of terminology and history.

However, it turned out that liberal arts majors didn’t take the class and that the most popular feedback was that the class lacked technical challenge. I’m prepping to teach the class and it will have two components. A Python training component where students simply learn Python. We won’t do super complicated things, but they will use Python extensively in future classes. The 2nd component is still in the vein of the old version of the course.

I’ll have the students read and discuss “Big Data Demystified” by David Stephenson. He spends 12 brief chapters introducing the reader to the importance of modern big data management, analytics, and how it fits into an organization’s key performance indicators. It reads like it’s for business majors, but any type of medium-to-large organization would find it useful.

Davidson starts with some flashy stories that illustrate the potential of data-driven business strategies. For example, Target corporation used predictive analytics to advertise baby and pregnancy products to mothers who didn’t even know that they were pregnant yet. He wets the appetite of the reader by noting that the supercomputers that could play Chess or Go relied on fundamentally different technologies.

The first several chapters of the book excite the reader with thoughts of unexploited potentialities. This is what I want to impress upon the students. I want them to know the difference between artificial intelligence (AI) and machine learning (ML). I want them to recognize which tool is better for the challenges that they might face and to see clear applications (and limitations).

AI uses brute force, iterating through possible next steps. There are multiple online tic-tac-toe AI that keep track records. If a student can play the optimal set of strategies 8 games in a row, then they can get the general idea behind testing a large variety of statistical models and explanatory variables, then choosing the best.

But ML is responsive to new data, according to what worked best on previous training data. There are multiple YouTubers out there who have used ML to beat Super Mario Brothers. Programmers identify an objective function and the ML program is off to the races. It tries a few things on a level, and then uses the training rounds to perform quite well on new levels that it has never encountered before.

There are a couple of chapters in the middle of the book that didn’t appeal to me. They discuss the question of how big data should inform a firm’s strategy and how data projects should be implemented. These chapters read like they are written for MBAs or for management. They were boring for me. But that’s ok, given that Stephenson is trying to appeal to a broad audience.

The final chapters are great. They describe the limitations of big data endeavors. Big data is not a panacea and projects can fail for a variety of what are very human reasons.

Stephenson emphasizes the importance of transaction costs (though he doesn’t say it that way). Medium sized companies should outsource to experts who can achieve (or fail) quickly such that big capital investments or labor costs can be avoided. Or, if internals will be hired instead, he discusses the trade-offs between using open source software, getting locked in, and reinventing the wheel. These are a great few chapters that remind the reader that data scientists and analysts are not magicians. They are people who specialize and can waste their time just as well as anyone else.

Overall, I strongly recommend this book. I kinda sorta knew what machine learning and artificial intelligence were prior to reading, but this book provides a very accessible introduction to big data environments, their possible uses, and organizational features that matter for success. Mid and upper level managers should read this book so that they can interact with these ideas prudentially. Those with a passing interest in programming should read it for greater clarity and to get a better handle on the various sub-fields. Hopefully, my students will read it and feel inspired to be on one side or the other of the manager- data analyst divide with greater confidence, understanding, and a little less hubris.

Notes on Austin and Health Economics

I was in Austin Texas for the first time this week for the first in-person meeting of the American Society of Health Economists since 2019. Some quick impressions on Austin:

  • Austin reminds me of many Southern cities, but Nashville most of all. Both historic state capitals that are booming, lots of people moving in and new infrastructure actually being built, forests of cranes putting up new glass towers. Both filled with bars, restaurants, and especially live music. But even with so much happening and so much being built, they don’t *feel* dense, you can always see lots of sky even downtown.
  • Austin seems to be a bizarre “pharmacy desert”, I think I walked 14 miles all through town before I saw one. Contrast to NYC with a Duane Reade on every block. In fact downtown seemed to have almost no chains of any kind, restaurants included; I wonder if this is just about consumer preferences or there’s some sort of anti-chain law.
  • Good brisket and tacos, as expected
  • Most US cities have redeveloped their waterfronts the last few decades to make them pleasant places to be, but Austin has done particularly well here, many miles of riverfront trails right downtown.

Thoughts from the conference:

Continue reading

The Latest GDP Data: First Quarter 2022 in the OECD

Today two data releases for Gross Domestic Product were released. The first release was for the United States, giving us the third and “final” release for first quarter 2022 data. It was down 1.6% from the prior quarter (though we knew this two months ago — not much has changed since the “advance” estimate). That’s not good (but see this great Joseph Politano newsletter for some more detail).

The second release was the annual 2021 GDP data for the European Union. The release showed strong growth in 2021 (+5.4%), but that’s relative to the bad year of 2020. So compared to the pre-pandemic level of 2019, the EU was still about 0.8% below this more accurate baseline. Comparatively, the US was already 2% above 2019 with the annual 2021 release (everything in these two paragraphs is adjusted for inflation). Of course, within the EU, there is a lot of variation, but overall the US looks comparatively well.

Let’s break down that variation in the EU and include the first quarter of 2022 data to make the best comparison with the US. To bring in some more relevant comparison countries, I’ll use data from the OECD for a complete comparison. Note: I’ve excluded Ireland, because their GDP is weird. I’ve also excluded Turkey, because even though all the data here is adjusted for inflation, Turkey is in a highly inflationary environment, making the data a little difficult to interpret.

Here is the chart, which shows the change in real GDP from the 4th quarter of 2019 up through the 1st quarter of 2022 (I use the volume index, which is similar to adjusting for price inflation). I have highlighted in orange the largest economies in the OECD (anything with about $2 trillion of GDP or larger, with Spain and Canada at about that level).

Continue reading

Shifts in Labor Participation: The Great Resignation Becomes the Great Reshuffling

More than 47 million workers quit their jobs in 2021, in what has become known as The Great Resignation. However, many of these workers are getting re-hired elsewhere. Hiring rates have outpaced quit rates since November, 2020.

The U.S. Chamber of Commerce has published some statistics on this reshuffling of the labor force, which I will reproduce here.  As shown in the chart below, quit rates in leisure and hospitality  (which require in-person attendance and pay lower salaries) were enormous. However, the recent hiring rates have been even higher in this area, so the shortage of labor there is only moderate.

When taking a look at the labor shortage across different industries, the transportation, health care and social assistance, and the accommodation and food sectors have had the highest numbers of job openings.

But yet, despite the high number of job openings, transportation and the health care and social assistance sectors have maintained relatively low quit rates. The food sector, on the other hand, struggles to retain workers and has experienced consistently high quit rates.

I am not sure I understand exactly what the following chart represents, but it was deemed important:

I think the % of yellow is the ratio of unemployed persons with experience in the field (i.e., who could readily participate) to the total job openings in that field. E.g., “…if every unemployed person with experience in the durable goods manufacturing industry were employed, the industry would only fill 65% of the vacant jobs.” These are interesting data, although I’d be even more interested in seeing  numbers on unfilled job openings as fraction of total (filled and unfilled) job openings to give a better idea on how much each industry is hurting for labor. Anyway, here is some of the commentary from the article:

It is interesting to look at labor force participation across different industries. Some have a shortage of labor, while others have a surplus of workers. For example, durable goods manufacturing, wholesale and retail trade, and education and health services have a labor shortage—these industries have more unfilled job openings than unemployed workers with experience in their respective industry. Even if every unemployed person with experience in the durable goods manufacturing industry were employed, the industry would only fill 65% of the vacant jobs.

Conversely, in the transportation, construction, and mining industries, there is a labor surplus. There are more unemployed workers with experience in their respective industry than there are open jobs.

The manufacturing industry faced a major setback after losing roughly 1.4 million jobs at the onset of the pandemic. Since then, the industry has struggled to hire entry level and skilled workers alike.

And finally:

Some industries have been less impacted by labor shortages but are grappling with how to deal with the rise of remote work. For example, the rise of remote work might explain why there has been less “reshuffling” in business and professional services.

Labor market tournaments and the cost of almost making it

Two weeks ago Tyler Cowen observed the increasing presence of family lineages in the NBA. The post is without much commentary, so I won’t impute any theory on Tyler’s behalf, but I suspect most people would observe this as the a product of genetics combined with the increased ability of NBA teams to precisely identify the attributes and aptitudes they want. There could also be a component of nepotism i.e. 2nd generation players are given greater leeway and time to develop, but given the revenues on the line in professional sports and the dependence on labor to compete, those effects are likely to be weak.

I’d like to offer an alternative theory to genetics that I refer to as “Better last than second”. There are certain lines of work, such as athletics, acting, music, or twitch streaming that are best thought of as winners-take-all labor tournaments. Any occupation where the concept of “making it” is well understood by its participants as an elusive but desirable goal can be considered a labor tournament.

There are lots of labor tournaments (academia for instance), but in most of them 2nd, 3rd, or Nth place are reasonably tolerable outcomes because the rewards correlate fairly linearly with any success level beyond abject failure. Nobody worries about being the 27,342nd ranked accountant in the world – that person likely makes a good living. Even if they can’t get a job as an accountant, they have skills that readily translate to a variety of other well-paid occupations. Winning is merely a (highly remunerative) cherry on top of an already pretty good oucome.

Basketball players worry a great deal about being the 474th best in the world.


The NBA at any given moment has 450 employees on their rosters. A couple dozen more float in and out on short term contracts to fill in for injuries and other player absences. The league minimum salary is $925k per season. The NBA developmental league (the G League) pays about 37k per season. That already makes it sound like the earnings dropoff from being the 449th best player to the 474th player is enormous, but it’s actually much, much worse.

It’s worse because basketball skills translate to the tiniest sliver of other jobs. Television acting, DJing house music, colorfully live streaming a Castlevania speed run: these are all skills that can pay large sums of money if you cross some imaginary threshold and “make it.” The catch that distinguishes “better last than second” markets from other winner-take-all labor tournaments is that participation requires the dedication of tens of thousands of hours building human capital whose rewards are skewed almost entirely towards a selected few. Those thousands of hours play out over the course of a survival game where, month by month, year by year a new round of “losers” is selected out.

The irony being that losing first is better than than getting the silver medal. Losing first means rebooting your life early and building up your human capital in something else (hopefully in something more forgiving of merely being very, very good). The silver medalist is, in fact, the biggest loser. The opportunity cost of time and energy they will never get back and never be rewarded for. I don’t worry about players that don’t get NCAA scholarships or drafted for the NBA. I worry about the guys hanging around in the G league until they’re 34 only to get released from their contract over a text message. I worry about the actors who’ve spoken 15 lines across 24 television guest spots and 3 commercials in 11 years based mostly on aesthetics, only to wake up at 34 and find themselves in the uncastable valley of normalcy. I worry about the members of all the bands I like but none of my friends have ever heard of.

Which brings us back to NBA lineages and why they seem to be becoming more common. If your father was in the NBA in the 80s or 90s, you probably come from upper-middle class or better means and, in turn, have the backing to tolerate the financial risk of not making it. Second, almost making it is likely to be less costly for you because you are part of a basketball family. Your name will grant you far greater access to the small number of basketball-adjacent jobs that will value your skills (i.e. coaching, scouting, recruiting, commentary, etc). Being part of a lineage makes that silver medal a lot more valuable. Maybe just as importantly, your family is likely to be a lot more supportive and tolerant of the risk you are taking. If your one of your parents had a six year run on Dynasty or made a living on the LPGA tour, they’re that much more likely to see a path to success for you.

As athletics become more lucrative, they become better understood. As they become better understood, the body of highly specific tacit knowledge grows as well. Lineage players will have access to this tacit knowledge through their parents. Dell Curry knew his son wasn’t going to particularly tall (Steph Curry is listed as 6′2", and official NBA heights are notoriously generous). This lead him to entirely reinvent his son’s shooting form in a manner that rendered him unable to shoot from any distance at all for months, entirely based on his understanding as a former NBA player that his son’s lack of genetic predisposition to play in the NBA required a motion that would catapult shots over much taller players. Even if lineage players do have genetic advantages in the high school and college stages of the tournament, the value of these advantages pale in comparison to the advantages of tacit knowledge precisely because of the stage of the game at which they are leveraged.

One could even argue that any genetic advantages that correlate to success at the early stages of a “better last than second” tournament (i.e. being 6’8″) are akin to a resource curse, giving the false impression of a non-trivial probability of “making it.” Conversely, a lack of genetic gifts (i.e. being 6’2″) while having access to the tacit knowledge valued at the last stage of the tournament truly are a blessing. If you survive the tournament until the last round without the obvious endowments other players have, you probably have a rich portfolio of other skills which, combined with the previously mentioned late-stage tacit knowledge, means you’ve been playing the game with less risk and greater expected value than others.


“Better last than second” labor tournaments are common in high prestige entertainment fields, but they aren’t limited to them. Any academic field that produces PhDs with little to no demand in the private market shuttle thousands of students through exactly such a tournament. The only difference is that the gold medal is a $87k a year job with the job security of tenure and “almost making it” often includes crippling student loans. It shouldn’t be much of a surpise that academia is full of lineages, too. And with those academic parents will come the knowledge of how decisions made in high school, college, grad school, and beyond will determine they win their respective labor tournaments. Or lose and have to settle for saving the world.

The World is Watching Top Gun Maverick

I see that you’re hurtin’, why’d you take so long

To tell me you need me? I see that you’re bleeding
You don’t need to show me again
But if you decide to, I’ll ride in this life with you
I won’t let go ’til the end

So cry tonight
But don’t you let go of my hand
You can cry every last tear

These are the lyrics to the song sung by Lady Gaga in the closing credits of Top Gun: Maverick. This song has been on the Billboard Top 100 chart for 6 weeks. The film TG:M is on its way to breaking a billion dollars worldwide at the box office this year. People (millions of people in every demographic all over the world) want to see Tom Cruise, playing himself (j/k), save the day. At the end of the movie they expect you to want to cry, and then Lady Gaga tells you to just let it out.

The only bit of acting that was hard to believe in the movie was the guy who was trying to play the arrogant jerk. The writers were trying to inject some drama with his lines, but the whole cast was so good-hearted and earnest. These folks seemed about 2 meters from heaven, and I don’t just mean because of flying at high altitudes.

After seeing TG:M in theaters this weekend, I watched the original 1986 movie (free on Amazon Prime right now) for comparison. The locker room banter in TG1 seemed more genuinely mean-spirited. That was back when bullies were bullies and no one was afraid of getting canceled?

Continue reading

Everyone’s an Expert: Easy Data Maps in Excel

I love data, I love maps, and I love data visualizations.

While we tend not to remember entire data sets, we often remember some patterns related to rank. Speaking for myself anyway, I usually remember a handful of values that are pertinent to me. If I have a list of data by state, then I might take special note of the relative ranking of Florida (where I live), the populous states, Kentucky (where my parents’ families live), and Virginia (where my wife’s family lives). I might also take special note of the top rank and the bottom rank. See the below table of liquor taxes by State. You can easily find any state that you care about because the states are listed alphabetically.

A ranking is useful. It helps the reader to organize the data in their mind. But rankings are ordinal. It’s cool that Florida has a lower liquor tax than Virginia and Kentucky, but I really care about the actual tax rates. Is the difference big or small? Like, should I be buying my liquor in one of the other states in the southeast instead of Florida? Without knowing the tax rates, I can’t make the economic calculation of whether the extra stop in Georgia is worth the time and hassle. So, the most useful small data sets will have both the ranking and the raw data. Maybe we’re more interested in the rankings, such as in the below table.

But, tables take time to consume. A reader might immediately take note of the bottom and top values. And given that the data is not in alphabetical order, they might be able to quickly pick out the state that they’re accustomed to seeing in print. But otherwise, it will be difficult to scan the list for particular values of interest.  

Continue reading

Irish Superman: 4 Weeks of Potatoes

Back in May I mentioned that a study was recruiting participants to try a 4-week all-potato diet. What I didn’t say was that I was joining the study, and I finished this week.

I’m glad I did it; I lost 8 pounds and 2 inches of waistline, going from slightly overweight (BMI 26) to just barely not-overweight (BMI 24.9). Here are some of my notes:

Day 5: Energy boost kicked in today. Feel half my age

Day 6: Potato energy going strong. Feel like Irish Superman

Day 15: Almost too much energy, hard to sit down at a computer and work, took a break to play basketball

So like many people who previously tried this, I can add more anecdotal evidence of weight loss (despite eating all the potatoes you want) and energy. I’ll also echo people who said that “hunger feels different” and not as demanding, and that it “resets your tastebuds” so that previously bland foods taste good (I just had a turnip with zero seasoning and it was almost too intense). Now to answer your likely questions:

Q: Did you actually eat nothing but potatoes for 4 weeks?

A: No, but I got reasonably close. I cooked potatoes in avocado oil and added seasonings, I drank coffee and beer, I ate other vegetables, I had some snacks. Overall I estimate I got 75-80% of my calories from potatoes.

Q: Was it hard to stick to? didn’t you get bored?

A: Being hungry or even bored weren’t really issues, all 5 times I slipped up and ate a meal that wasn’t potatoes I’d say it was for social reasons (I was at a party with great food, at a restaurant with someone, et c)

Q: What kinds of meals did you cook?

A: Lots of home fries and roast potatoes using lots of varieties of potato (russet, gold, red, purple, sweet). Mashed potatoes a few times. McCain’s craft beer fries for my birthday.

Q: Aren’t potatoes bad for you? Why didn’t this make you fat?

A: Anything can be bad for you if you deep-fry it, or otherwise smother it with fats or process it to death. This is probably how most potatoes get consumed in America, but they start as nutritious root vegetables.

Q: What about protein? Doesn’t this kill your gains?

A: This was my biggest concern going into the study. Potatoes do have more protein than I thought, enough to live on but probably not enough to make you strong. My lifts did come down a bit, though it’s unclear if that was due to the lack of protein or just the lower calories and weight loss taking some muscle along with the fat. I was eating high-protein yogurt many days to try to mitigate this.

Q: If this is so great, are you going to keep doing it?

A: No, it was great for the first 14-16 days then just ok. Most of the weight loss and energy boost happened in the first half. If I ever do this again I’m going to plan on two weeks, which I think is also what Penn Jillette suggests. I do think I’ll do potatoes for lunch a lot more often than I used to, and pivot this to a “whole foods / not-ultra-processed” diet.

Q: Is there something special about potatoes? Would any single-food diet work as well?

I’m not sure. Some of the benefit likely comes from cutting out variety, so not eating a lot just because “I need to try everything”. Some likely comes from cutting out specific categories of food, like high fat / high sugar / hyper-palatable. I don’t think that just any food would work, probably most whole foods would, but potatoes are cheap and nutritious. The potato diet leading to weight loss is consistent with many, though not all theories of obesity.

Q: Can I still sign up for the study?

A: No:

Signups are now closed, but we plan to do more potato diet studies in the future. If you’re interested in participating in a future potato diet study, you can give us your email at this link and we’ll let you know when we run the next study.

But you can always just do it yourself.

US Households Have a Lot More Income Than 1967, and It’s Probably Not Just Because of the Rise of Dual-Income Households

We are going through some tough economic times right now: high rates of inflation (generally exceeding wage growth) with the strong possibility of a recession in the near future. In times like this, I think it is useful to also consider the historical perspective. The US economy has gone through challenging times in the past, but the long-run track record is impressive.

Here is one way to show the data. It comes from the Census Bureau, and shows the total money income of households in the US. The data is, of course, adjusted for inflation, and not just with the regular CPI-U: they use the superior CPI-U-RS, which attempts to maintain a consistent methodology for how prices are measured (BLS is constantly improving the CPI, but that sometimes makes historical comparisons challenging). I present the data both as a percent of the total number of households, and the absolute numbers.

I’ve shaded the chart to suggest that over $100,000 of annual income is high income, and under $35,000 is low income, with everything else considered “middle class.” By these definitions, the number of high-income households in the US increased dramatically from 6.6 million (10.9% of the total) in 1967 to 43.7 million (33.6% of the total) in 2020. The number of low-income households also rose, unfortunately, from 21.4 million in 1967 to 34 million in 2020, but the portion of the total fell (from 35.2% to 26.2%) since it increased slower than the overall growth of the number of households. Today, there are more high-income households (43.7 million) than low-income households (34 million) in the US.

But even if you don’t like those definitions, I’ve provided as much detail in the chart as Census makes available publicly. For example, let’s say you think $200,000 is what makes you high income. There were fewer than 1 million of these households in 1967 (1.3% of the total). Today, there are over 13 million of them (10.3% of the total). However we slice the data, there are a lot more high-income households in the US than in the past. (Remember remember, this is all adjusted for inflation.)

Many people found this data interesting when I posted it to Twitter, including the world’s richest person. But among the many objections raised is that this is driven by the rise of female employment and dual-income households. And indeed, that is a factor. But how much of a factor?

Let’s dig into the data.

Continue reading