Data Heroes

How often do we hear about “data heroes”? As a data analytics teacher, this just thrills me. Bloomberg reported on the Data Heroes of Covid this week.

One of the terrible things about Covid-19 from the perspective of March 2020 was how little we knew. The disease could kill people. We knew the 34-year-old whistleblower doctor in China had died of it. We knew the disease had caused significant disruption in China and Italy. There were so many horror scenarios that seemed possible and so little data with which to make rational decisions.

The United States has government agencies tasked with collecting and sharing data on diseases. The CDC did not make a strong showing here (would they argue they need more funding?). I don’t know if “fortunately” is the right word here, but fortunately private citizens rose to the task.

The Covid Tracking Project gathers and releases data on what is actually happening with Covid and health outcomes. They clearly present the known facts and update everything as fast as possible. The scientific community and even the government relies on this data source.

Healthcare workers have correctly been saluted as heroes throughout the pandemic. The data heroes volunteering their time deserve credit, too. Lastly, I’d like to give credit to Tyler Cowen for working so hard to sift through research and deliver relevant data to the public.

New Research on Stress

This weekend I am participating (virtually, remotely) in the Southern Economics Association annual meeting where economists talk about research in progress. I saw Laura Razzolini present a new project yesterday.

She and coauthors surveyed people in the city of Birmingham, AL before and after a major disruption to commuter traffic. One thing they find is that people who have a longer commute due to a road closure are more stressed.

AS IT HAPPENED, Covid came along and started stressing people soon after. So they did another round of surveys and have great baseline data to compare Covid-stressed people with. I will not discuss her results on how stress affects decision making here. She has got some really neat results. The paper will be called something like “Uncovering the Effects of Covid-19 on Stress, Well-Being, and Economic Decision-Making”.

The magnitude of the increase in stress from a longer commute was something like 2.5 on a scale of 1-10. (Do not quote me – I do not have her paper to reference – this is from memory)

A comment from the audience was that it looked like the magnitude of the increase of stress from a longer commute and from Covid were similar. How could that be? Isn’t a deadly disease worse than traffic?

To explain this, I return to my favorite xkcd comic. When you hover your mouse over the comic, it says “Our brains have just one scale, and we resize our experiences to fit.” (Apropos of nothing, the fact that the comic artist picked Joe Biden as an example of someone who isn’t very important in 2011 seems pretty strange now.)

So, when traffic got worse people could only express “my life got worse”. And when Covid-19 caused shutdowns in the Spring of 2020, people again said “my life got worse”.

We only have one scale, and we resize our experience to fit. Thanksgiving is coming up. I would hope that we could take a day off from the 2020 year-of-doom talk and find something to be grateful for, because things actually can get worse. I also send out sincere condolences to all those who will be spending The Holidays apart from loved ones because of Covid-19.

Third Quarter Check-In: COVID and GDP

How have countries around the world fared so far in the COVID-19 pandemic? There are many ways to measure this, but two important measures are the number of deaths from the disease and economic growth.

Over the past few weeks, major economies have started releasing data for GDP in the third quarter of 2020, which gives us the opportunity to “check in” on how everyone is doing.

Here is one chart I created to try to visualize these two measures. For GDP, I calculated how much GDP was “lost” in 2020, compared with maintaining the level from the fourth quarter of 2019 (what we might call the pre-COVID times). For COVID deaths, I use officially coded deaths by each country through Nov. 15 (I know that’s not the end of Q3, but I think it’s better than using Sept. 30, as deaths have a fairly long lag from infections).

One major caution: don’t interpret this chart as one variable causing the other. It’s just a way to visualize the data (notice I didn’t try to fit a line). Also, neither measure is perfect. GDP is always imperfect, and may be especially so during these strange times. Officially coded COVID deaths aren’t perfect, though in most countries measures such as excess deaths indicate these probably understate the real death toll.

You can draw your own conclusions from this data, and also bear in mind that right now many countries in Europe and the US are seeing a major surge in deaths. We don’t know how bad it will be.

Here’s what I observe from the data. The countries that have performed the worst are the major European countries, with the very notable exception of Germany. I won’t attribute this all to policy; let’s call it a mix of policy and bad luck. Germany sits in a rough grouping with many Asian developed countries and Scandinavia (with the notable exception of Sweden, more on this later) among the countries that have weathered the crisis the best (relatively low death rates, though GDP performance varies a lot).

And then we have the United States. Oddly, the country we seem to fit closest with is… Sweden. Death rates similar to most of Western Europe, but GDP losses similar to Germany, Japan, Denmark, and even close to South Korea. (My groupings are a bit imperfect. For example, Japan and South Korea have had much lower death tolls than Germany or Denmark, but I think it is still useful.)

To many observers, this may seem strange. Sweden followed a mostly laissez-faire approach, while most US states imposed restrictions on movement and business that mirrored Western Europe. Some in the US have advocated that the US copy the approach of Sweden, even though Sweden seems to be moving away from that approach in their second wave.

Counterfactuals are hard in the social sciences. They are even harder during a public health crisis. It’s really hard to say what would have happened if the US followed the approach of Sweden, or if Sweden followed the approach of Taiwan. So I’m trying hard not to reach any firm conclusions. To me, it seems safe to say that in the US, public policy has been largely bad and ineffective (fairly harsh restrictions that didn’t do much good in the end), yet the US has (so far) fared better than much of Europe.

All of this could change. But let’s be cautious about declaring victory or defeat at this point.

Coda on Sweden Deaths

Are the officially coded COVID deaths in Sweden an accurate count? One thing we can look to is excess deaths, such as those reported by the Human Mortality Database. What we see is that Swedish COVID deaths do almost perfectly match the excess deaths (the excess over historical averages): around 6,000 deaths more than expected.

Some have suggested that the high COVID deaths for Sweden are overstated because Sweden had lower than normal deaths in recent years, particularly 2019. This has become known as the “dry tinder” theory, for example as stated in a working paper by Klein, Book, and Bjornskov (disclosure: Dan Klein was one of my professors in grad school, and is also the editor of the excellent Econ Journal Watch, where I have been published twice).

But even the Klein et al. paper only claims that “dry tinder” factor can account for 25-50% of the deaths (I have casually looked at the data, and these seems about right to me). Thus, perhaps in the chart above, we can move Sweden down a bit, bringing them closer to the Germany-Asia-Scandinavia group. Still, even with this correction, Sweden has 2.5x the death rate of Denmark (rather than 5x) and 5x the death rate of Finland (rather than 10x, as with officially coded deaths).

As with all things right now, we should reserve judgement until the pandemic is over (Sweden’s second wave looks like it could be pretty bad). The “dry tinder” factor (a term I personally dislike) is worth considering, as we all try to better understand the data on how countries have performed in this crisis.

Political Polarization and Social Distancing

Political polarization has been rising in the United States in recent years. There are two key reasons contributing to the polarization. First, we naturally hold different beliefs over objective matters. Furthermore, we trust different news sources. According to the 2020 Pew Research Survey around 75% conservative Republicans say they trust the information from Fox News, while 77% liberal Democrats say they distrust it.

Media outlets and politicians on the right and left sent divergent messages about the severity of the crisis during Coronavirus pandemic. A joint study by economists from NYU, Stanford and Harvard university find evidence for partisan differences in social distancing (Allcott et al. 2020). They combined a survey study with GPS location data, where GPS data record daily and weekly visits to the points of interest (POIs). The GPS data shows the strong partisan differences in social distancing behavior that emerged with the rise of COVID. The analysis carefully controlled for local policy, health, weather, and economic variables, the result remains statistically and economically significant. They also used a nationally representative survey to measure the individual behavior and belief differences about social distancing. Demographics, beliefs regarding the efficacy of social distancing, self-reported distancing, and predictions about future COVID cases. They find compare to Republicans, Democrats believe the pandemic is more severe and report a greater reduction in contact with others.


Allcott, Hunt, Levi Boxell, Jacob Conway, Matthew Gentzkow, Michael Thaler, and David Y. Yang. “Polarization and public health: Partisan differences in social distancing during the Coronavirus pandemic.” NBER Working Paper (2020).

A Covid Conversation… But with Humility.

We know WAY more about Covid-19 than we used to. But there is plenty of appropriate and inappropriate incredulity concerning the data meaning, validity, and implication. I want to take a minute and give it the good ol’ Stat – 201 college try. Here’s the level-headed and appropriately humble Covid statistics conversation.

A: “The US has more cases of Covid than Portugal.”

B: “Yes, but that’s not important. They are very different countries. After all, 65% of people in Portugal live in urban centers. For the US, that number is 80%. Obviously, people being close together, such as in urban places, will contribute to more Covid cases.”

A: “OK. Fine. They may be incomparable. But the US has more cases than the UK, which has a similarly urban population of 83%.”

B: “Yes, but the US is larger. The UK has a smaller population – Of course the US has more cases.”

A: “Ah! And the US also has a Covid positivity rate well in excess of the UK.”

B: “Hmm… That is something. The problem is that the testing is not administered in the same fashion in both places (or across time). That is, neither set of tests is a simple random sample of people and neither is biased in sampling in the same sort of relevant ways.”

A: “But how do you know that the samples aren’t collected in the same sort of ways? Someone feels poorly, then they go and get tested. Isn’t that how is works everywhere?”

B: “Not necessarily at all. Some countries and municipalities offer free testing. Other places have more or less scarcity of tests and surely that affects whom they decide to test. Not only that, different people are differently willing to get tested (maybe they’d have to involuntarily stop working, for example). My point is that the testing samples are not both biased in favor or against positives in the same way and we have little way of telling either the direction or magnitudes. The fact that both countries test a similar proportion of the population doesn’t address the sampling method.”

A: “OK. Well, I suppose that we ought not try at all then, according to you? Isn’t some problematic data better than none?”

B: “Problematic data is not better than none at all if we have good reason to think that there isn’t enough in common between sample collection methods to make valid comparisons.”

A: “Right, so you’re saying that we have to be agnostic.”

B: “In some sense, yes. But rather than Covid cases, we can track relevant variables whose sampling is more comparable. Hospitalizations are better, but we still have the issue of selection bias among those being admitted and a bias due to different hospital capacities between localities. The best measure is the number of deaths due to Covid. People can’t elect out of that sample.”

A: “Hm… Ok. But while total deaths is a more dependable statistic, it is less relevant. Of course deaths matter a great deal, but Covid makes people feel terrible and may even have long term effects.”

B: “You’re right. Covid deaths Vs cases has the trade-off of relevance Vs dependability. Arguably, deaths are the most important possible symptom – although I take your point that it’s not the only relevant symptom. Ultimately, however, the death numbers are more dependable and we should use them if we want a high degree of certainty.”

A: “Fine. The US has more Covid deaths than does the UK, both in level and in deaths per thousand of population.”

B: “Yep. You are right. But the US has more Covid cases, so of course it has more Covid deaths than the UK. The correct statistic is, given a Covid diagnosis, how likely are you to die of Covid? In the UK, a much higher proportion of people with a Covid diagnosis die. In other words, Covid is more dangerous in the UK than it is in the US.”

A: “Time out. Two things: 1) Didn’t you say just a moment ago that the testing data wasn’t reliable enough? Now you’re using it as if it’s reliable. 2) If we are making a cross country comparison, then can’t we just say that a person, randomly drawn from the population, is more likely to die from the Covid in the US than in the UK?”

B: “Mea culpa. You’re right on both points. At the end of the day, a US person is more likely to die of Covid. But, in the UK a person with Covid may be more likely to die. So what do we do about that?”

A: “Good question…”