Fitbit got 2 billion and all I got was an email

I made a Fitbit account years ago, even though I don’t wear one. As a user, I got an email on Jan 14, 2021 alerting me that they just sold Fitbit to Google. The email assured me that Google will not try to muscle Fitbit users away from iPhones or iOS. Google has said that it will keep Fitbit data “separate from other Google ad data.” TechCrunch had some more details for me, including how many billions of dollars Fitbit was getting out of this deal.

Is it so bad to see adsbased on your sleep habits? What if you had a bad night and then saw more coffee ads the next day? Seems fine. Is it more “creepy” than seeing an ad for something you just bought?

I don’t actually know much about Google’s data structure. But I can imagine ways that a large tech company could use Fitbit data in a way that users would not like. What if Google knows that you didn’t sleep well this week. Say someone else is using Google search to find a person to recruit for a desirable job in Public Relations. What if predictive models indicate that people who don’t get at least 6.5 hours of sleep per night are low performers? What if you ended up not getting linked up with your dream job, because you weren’t sleeping well one week? This is all speculative. What if Google starts measure how your heart rate responds to viewing various website that you access through Chrome? Have they agreed to not do that as part of the acquisition deal?

In 2018, Tyler sat down with Eric Schmidt, a senior executive of Google. Tyler asked him why Google doesn’t use their massive stores of data to inform investments for a hedge fund. Here was the reply:

SCHMIDT: Well, I’ll give you a more generic answer, which is, from the moment I joined the company, there were many people who said, “Why don’t you take this information and do something that will use it for marketing purposes?”

And the answer is always the same, which is that you need people’s permission to do that, and you can be sure you won’t get that permission, if you follow that reasoning. So we decided that was a pretty bright line. For example, if a tech company that were a consumer company were bundled with a hedge fund, you would have to disclose that it was being used in that context. The people would go crazy.

But the other thing that’s true — and Google was good about this — is we took the position that it was important for us to disclose everything we were doing as well as we could.

I’ll give you a governance argument. In a large company, the employees are independent citizens of humanity, and if they see corruption in your leadership — in other words, if they see you doing things which are inconsistent with the values, you will be criticized.

Schmidt doesn’t deny that Google could take advantage of data in order to become a successful hedge fun. He says that it would look bad, and Google doesn’t want to look bad even to its own employees. Hmmm, right? I don’t bring this up to accuse Google of wrongdoing. It just makes you wonder how things will unfold in the future. One can, at least, see why the acquisition of Fitbit was scrutinized.

I use Google products heavily on my laptop. I don’t have many “smart” devices aside from my smartphone. I wore the Fitbit step tracker for a few days, but I didn’t find the information to be helpful. It’s not like the Fitbit does the dishes for me or drives me to the gym. Get me that smart device and I’ll look at any ads you want.

QE, Stock Prices, and TINA

The U.S. economy as quantified by GDP has been sputtering along in slow growth mode for a number of years. It took a huge hit in 2020 due to covid shutdowns and has not nearly recovered. But stock prices have been rocketing upwards, and this past year is no exception. Markets took a cliff-dive in March, but have since way overshot to the upside.

Here is a plot of the past five decades of U.S. GDP and of the Wilshire 5000 index, which approximates the total stock market capitalization in the U.S.:

Chart Source: St. Louis Fed, as plotted by Lyn Alden Schwartzer

These two curves have crisscrossed each other over the past five decades, but in recent years the stock market has roared to the upside. One of Warren Buffet’s favorite metrics as to whether stock are overvalued is to consider the ratio of these two quantities, i.e. the market-capitalization-to-GDP (Cap/GDP) ratio:

Source: Lyn Alden Schwartzer

The ratio is much higher than it has even been. The last time it got this high was in 2000, and that did not end well.

Continue reading

Excess Mortality in 2020

My last post of 2020 tried to end the year on an optimistic note: the rapid innovation of a new vaccine was truly a marvel. But I also warned you that I would have a post in the new year talking about the deaths of 2020 during the pandemic. And here it is.

Throughout 2020, I have tried to keep up with the most recent data, not only on officially coded COVID-19 deaths, but also on other measures. An important one is known as excess mortality, which is an attempt to measure the number of deaths in a year that are above the normal level. Defining “normal” is sometimes challenging, but looking at deaths for recent years, especially if nothing unusual was happening, is one way to define normal. The team at Our World in Data has a nice essay explaining the concept of excess mortality.

One thing to remember about death data is that it is often reported with a lag. The CDC does a good job of regularly posting death data as it is reported, but these numbers can be unfortunately deceptive. For example, while the CDC has some death data reported through 51 weeks of 2020, but they note that death data can be delayed for 1-8 weeks, and some states report slower than others (for reasons that are not totally clear to me, North Carolina seems to be way behind in reporting, with very little data reporting after August).

So there’s the caution. What can we do with this data? Since 2019 was a pretty “normal” year for deaths, we can compare the deaths in 2020 to the same weeks of data in 2019. In the chart at the right, I use the first 48 weeks of the year (through November), as this seems to be fairly complete data (but not 100% complete!). The red line in the chart shows excess deaths, the difference between 2019 and 2020 deaths. From this, we can see that there were over 357,000 excess deaths in 2020 in the first 11 months of the year, or about a 13.6% increase over the prior year.

Is 13.6% a large increase? In short, yes. It is very large. I’ll explain more below, but essentially this is the largest increase since the 1918 flu pandemic.

Continue reading

2020 Holiday Viewing

Forget “The Christmas Prince” or “The Prince Christmas” or whatever is on Netflix. Why not spend your holiday refreshing this new vaccine dashboard?

Here’s the announcement:

I personally know a few health care workers who got their shots (do not say “jab” to me) this past week. It’s all very exciting! Here at University of Alabama at Birmingham (UAB), the medical community has freezers, fortunately.

Here’s VP Mike Pence getting his vaccine:

Jeremy and Doug have both talked about allocation this week. Economists get really jazzed about allocating scarce resources. It’s been frustrating to watch first tests and now vaccines not be available on a market. Excellent points are also made every week over at Marginal Revolution on how we are missing an opportunity to get the incentives right. Supply. Curves. Slope. Up. (Thousands. Dying. Every. Week.)

Number of Nice Days

I’m relatively new to Birmingham, Alabama. I was nervous about moving to a place with famously long hot humid summers. My intuition since moving here is that there are many days throughout the year when, at some point in the day, the weather is nice for doing something outside with my kids.

Yesterday, Sunday, was very nice. To have such a nice warm sunny day in mid-December is strange to me. I grew up further north where Decembers are chilly. Here is a picture of a neighbor’s son enjoying the summer-like feel of this technically-winter day. This picture was taken at noon.

Although I am grateful for this particular day, I also think about the hot summer days when noon is a time to hide indoors with air conditioning. Is it nice here? How can that question be answered scientifically?

There is actually a great map of answers, available on several websites, credited to Brian Brettschneider, thanks to data from Iowa State.

This map confirmed my intuition. My old life in New Jersey was in the dark green zone, and my new life in Alabama is one level better, in terms of how many “nice” days you can expect in a year.

If you don’t have climate control, then you might be more worried about weather extremes. If you are lucky enough to have a regulated indoor environment, then a nice place to live is largely a question of how many days you get when it’s nice to “go out”.

This map accounts for “nice days”. I wonder if New Jersey would seem closer to Alabama if the measure changed to “nice daylight hours”. Yesterday was beautiful, but it was dark by 5pm. When I get time, I’m going to make a map of where in the lower 48 you can enjoy dinner outside after work many times per year (and why is it Southern California?).

Visual Decision Trees in SAS Viya

As I said earlier, I used SAS Viya for Learners this semester. I assigned a final project for students. They had to use the data pre-loaded into the free version of SAS Viya, but otherwise had freedom to select their own variables and construct their own research question.

SAS Viya for Learners just recently opened for any users to make an account. This will allow you to learn SAS Viya functions (but not do your own actual work, because you cannot import new data). I’m using SAS Viya 3.4.

I like the way SAS Viya allows users to create a beautiful intuitive interactive decision tree model. This blog is to show you what that looks like. In traditional EconLit, regressions are more popular than decisions trees. Decision trees are a simple and useful machine learning technique. If you are trying to teach a first-timer about decision trees, then the visualization in SAS Viya for Learners can be helpful.

I’ll demonstrate using a decision tree for classification using built-in SAS data. One of the larger datasets available is USCENSUS1990. I’ll use it to demonstrate (and I do love the 90’s!). Consider the variable about the number of children a person has. This could be reasonably predicted by age and education level. [Footnote 1]

Here’s a chart showing the frequency of family sizes for adult women. (I used a Filter to only include people who are not coded zero in iFertil. See Footnote 1.)

For adult women in 1990, the most frequent category is to have more than 2 children. This would include the parents of Boomers. Think about those big families you know from the Boomer generation.

For input variables to my model, I’ll use age categories and also education levels. I set the new categorical variable I created called NumberChildren as the Response variable for a decision tree model. [Footnote 2] Here’s a zoomed out picture of the visual model output.

It’s immediately obvious that age is more informative than schooling. Women under the age of 30 are much more likely to have no children. The width of the grey tree branches makes it easy to see where the majority of the observations are.

I’ll zoom in on the left side of the tree where most of the people are.

The “>= 4.25” means that women on the far left side are over the age of 40. Among older women, the norm is to have 2 or more children. If you are looking for the older women with exactly two children, you are more likely to find them among those who have an education score of larger than “13”, meaning that they have a Bachelor’s degree or higher.

My point is not to posit causal relationships among education and fertility. My point is how awesome these graphs are. You do have to learn some point-and-click functions within SAS Viya to make them. But I don’t know of any other software that can produce this.

SAS Viya also provides tables and statistics on each node, which is more like what I could get from free open source software a few years ago when I looked into decision tree packages.

[Footnote 1] If you want to replicate what I did, know that the USCENSUS1990 dataset in SAS Viya comes with no explanation. Google brought me to UCI, where I found what I needed in terms of technical documentation.

dAge, iFertil, iSex, iYearsch are the names of the variables you will find in SAS. To create my graphs and models, I converted some of them to categorical variables using the “+New Data Item -> Custom Category” functions. No programming is required.

iSex: 0 indicates Male, 1 indicates Female

dAge is coded as follows: 0 is babies; 1 is under 13, 2 is under 20 (but over 13), 3 is under 30, 4 is under 40, 5 is under 50, 6 is under 65,  7 is for 65 and over

iFertil is coded: 0 is either less than 15 years old or male, 1 is no child, 2 means they have one child (confusing…), 3 means they have two children, all the way up to a 13 which is the code for 12 or more children

iYearsch:  3-10 refers to primary school up to a 10 indicating graduating from high school, 11-13 refer to some college and associate degrees, 14 is a Bachelors degree, 15-17 refers to higher degrees

[Footnote 2]  I decided to set Maximum levels to 5 in Options. This keeps the tree smaller which looks better in the blog.

Alabama’s Covid Data Hero

Housekeeping: There was no post yesterday on Economist Writing Every Day. It was my day to write and family responsibilities just took up every minute. This might happen occasionally.

Last week I wrote about American Data Heroes. There are many that I don’t know of , but I wanted to share the work of tireless Frank McPhillips. For months he wrote a succinct post every single day within a private Facebook group for concerned citizens of Alabama. Recently he switched to Substack, meaning I can share it here.

McPhillips summarizes and explains the Covid data for the state of Alabama, where I live. This is data that is publicly available, but most people like myself don’t want to do as much work as he does to understand it. He also understands when the reporting might be wrong or late.

I’m going to quote from his most recent posts.

Dec 5:

The Chairman of the Madison County Commission was more blunt. “We’re now talking about alternative space for a morgue”, he said, adding that he has never faced such a decision in 25 years of public service.

According to HHS, 87.7% of Alabama’s ICU beds are occupied… Our State added 3,390 more COVID cases today (incl. 655 probables), raising the 7-day moving average to 3,228 cases per day, which is twice the daily average 3 weeks ago.

Dec 6:

With 473 more cases, Birmingham’s home county, Jefferson, has a 14-day positivity rate that tops 30% for the first time. 

Dec 7:

Now, brace yourself for the updated hospitalization data:  2,079 patients (105 reporting hospitals), a jump of 163 patients in a single day. The Huntsville Hospital system reported 378 COVID patients, an increase of 76 patients in one week. DCH Health system reported 138 patients, double the number just 9 days ago. And finally, Regional Medical Center (Anniston) announced new visitation restrictions due to the pandemic: “For end-of-life care, two visitors will be permitted to remain in the patient’s room, without leaving or re-entering the building and without substitution”.

I appreciate all his work. It’s obvious when he’s getting depressed or exhausted, but he’s decided to keep going (almost) every day. He keeps writing new prose on how this is the most deadly “war” of our time. He wants people to keep fighting back and not get complacent. See his Dec 7 post for more war comparisons.

McPhillips has helped a lot of people in his locality. He inspires me for Writing Every Day.

Teaching with SAS Viya for Learners: Last Report

I used SAS Viya for Learners to teach data analytics to undergraduate business students for one semester.

I’ll start with the benefits of SAS Viya: It’s free; It’s visually appealing and requires no coding; There are some SAS tutorial materials that teachers can use; The way decision tree results are displayed makes intuition easy for students who are new to data mining.

I made a post earlier in which I reported that it actually works. I still think that, but at the end of the semester I did have individual students experience errors and mysterious interruptions to service. It made me wonder if the server gets busy at the end of an academic semester.

SAS is known for making excellent products and charging high fees for them. Since SAS Viya is free, they aren’t going to be giving all the functionality with it. The free version does not let students import any data. There is a sandbox of data to learn with, which is more than enough to fill a semester. I didn’t even open most of the available data sets.

My students did their final projects by choosing one of the pre-loaded datasets and using that for analysis. As far as applying the principles I taught, this was fine. In one sense, it was easier than telling them to fend for themselves and find data on the world wide web.

The downside is that the software-specific skills students learn from free student SAS Viya can’t be used on a project for work or for a different class. Eventually, any useful work involves importing new data.

The decision to use SAS Viya for Learners instead of R should depend on what your students want to do next. Both products will allow them to learn concepts and common functions.

If you are going to use SAS Viya, I highly recommend using the tutorials made by SAS with screenshot-by-screenshot instructions. You can give the instructions to the students, so students aren’t coming to you with questions about every click they need to make.

I paired SAS Viya with a Business Intelligence Textbook. Also note that students had already taken a traditional Business Statistics course previously.

Data Heroes

How often do we hear about “data heroes”? As a data analytics teacher, this just thrills me. Bloomberg reported on the Data Heroes of Covid this week.

One of the terrible things about Covid-19 from the perspective of March 2020 was how little we knew. The disease could kill people. We knew the 34-year-old whistleblower doctor in China had died of it. We knew the disease had caused significant disruption in China and Italy. There were so many horror scenarios that seemed possible and so little data with which to make rational decisions.

The United States has government agencies tasked with collecting and sharing data on diseases. The CDC did not make a strong showing here (would they argue they need more funding?). I don’t know if “fortunately” is the right word here, but fortunately private citizens rose to the task.

The Covid Tracking Project gathers and releases data on what is actually happening with Covid and health outcomes. They clearly present the known facts and update everything as fast as possible. The scientific community and even the government relies on this data source.

Healthcare workers have correctly been saluted as heroes throughout the pandemic. The data heroes volunteering their time deserve credit, too. Lastly, I’d like to give credit to Tyler Cowen for working so hard to sift through research and deliver relevant data to the public.

Third Quarter Check-In: COVID and GDP

How have countries around the world fared so far in the COVID-19 pandemic? There are many ways to measure this, but two important measures are the number of deaths from the disease and economic growth.

Over the past few weeks, major economies have started releasing data for GDP in the third quarter of 2020, which gives us the opportunity to “check in” on how everyone is doing.

Here is one chart I created to try to visualize these two measures. For GDP, I calculated how much GDP was “lost” in 2020, compared with maintaining the level from the fourth quarter of 2019 (what we might call the pre-COVID times). For COVID deaths, I use officially coded deaths by each country through Nov. 15 (I know that’s not the end of Q3, but I think it’s better than using Sept. 30, as deaths have a fairly long lag from infections).

One major caution: don’t interpret this chart as one variable causing the other. It’s just a way to visualize the data (notice I didn’t try to fit a line). Also, neither measure is perfect. GDP is always imperfect, and may be especially so during these strange times. Officially coded COVID deaths aren’t perfect, though in most countries measures such as excess deaths indicate these probably understate the real death toll.

You can draw your own conclusions from this data, and also bear in mind that right now many countries in Europe and the US are seeing a major surge in deaths. We don’t know how bad it will be.

Here’s what I observe from the data. The countries that have performed the worst are the major European countries, with the very notable exception of Germany. I won’t attribute this all to policy; let’s call it a mix of policy and bad luck. Germany sits in a rough grouping with many Asian developed countries and Scandinavia (with the notable exception of Sweden, more on this later) among the countries that have weathered the crisis the best (relatively low death rates, though GDP performance varies a lot).

And then we have the United States. Oddly, the country we seem to fit closest with is… Sweden. Death rates similar to most of Western Europe, but GDP losses similar to Germany, Japan, Denmark, and even close to South Korea. (My groupings are a bit imperfect. For example, Japan and South Korea have had much lower death tolls than Germany or Denmark, but I think it is still useful.)

To many observers, this may seem strange. Sweden followed a mostly laissez-faire approach, while most US states imposed restrictions on movement and business that mirrored Western Europe. Some in the US have advocated that the US copy the approach of Sweden, even though Sweden seems to be moving away from that approach in their second wave.

Counterfactuals are hard in the social sciences. They are even harder during a public health crisis. It’s really hard to say what would have happened if the US followed the approach of Sweden, or if Sweden followed the approach of Taiwan. So I’m trying hard not to reach any firm conclusions. To me, it seems safe to say that in the US, public policy has been largely bad and ineffective (fairly harsh restrictions that didn’t do much good in the end), yet the US has (so far) fared better than much of Europe.

All of this could change. But let’s be cautious about declaring victory or defeat at this point.

Coda on Sweden Deaths

Are the officially coded COVID deaths in Sweden an accurate count? One thing we can look to is excess deaths, such as those reported by the Human Mortality Database. What we see is that Swedish COVID deaths do almost perfectly match the excess deaths (the excess over historical averages): around 6,000 deaths more than expected.

Some have suggested that the high COVID deaths for Sweden are overstated because Sweden had lower than normal deaths in recent years, particularly 2019. This has become known as the “dry tinder” theory, for example as stated in a working paper by Klein, Book, and Bjornskov (disclosure: Dan Klein was one of my professors in grad school, and is also the editor of the excellent Econ Journal Watch, where I have been published twice).

But even the Klein et al. paper only claims that “dry tinder” factor can account for 25-50% of the deaths (I have casually looked at the data, and these seems about right to me). Thus, perhaps in the chart above, we can move Sweden down a bit, bringing them closer to the Germany-Asia-Scandinavia group. Still, even with this correction, Sweden has 2.5x the death rate of Denmark (rather than 5x) and 5x the death rate of Finland (rather than 10x, as with officially coded deaths).

As with all things right now, we should reserve judgement until the pandemic is over (Sweden’s second wave looks like it could be pretty bad). The “dry tinder” factor (a term I personally dislike) is worth considering, as we all try to better understand the data on how countries have performed in this crisis.