My third post on Covid data heroes features Dr. Emily Oster. Emily is a mom. Lot’s of economists are moms, but few have incorporated it quite as much into their careers. Emily has written a book on pregnancy and a new one on what to do with the kids after they are born. She does a great job explaining scientific research in a way that is easy to understand.
Emily made a big push to collect data on schools and covid back when there was crippling uncertainty about how dangerous it is to let children go to school in person.
She has a great email newsletter and substack. Her latest post is called “Vaccines & Transmission Redux Redux”. In this post, she distills the latest research to give practical advice on when kids can see grandparents once the vaccines are out.
For a long time now, some families have been avoiding close contact with elderly relatives. When can we go back to normal?
With all the uproar around the election in December, the news of the SolarWinds data breach did not get the attention it deserved. Some well-resourced foreign organization, almost certainly in Russia, succeeded in infiltrating the data systems of an astounding 18,000 or more U.S. organizations. These included major federal agencies such as the Pentagon, the Department of Homeland Security, the State Department, the Department of Energy, the National Nuclear Security Administration, and the Treasury, and other big targets like Microsoft, Cisco, Intel, and Deloitte, and organizations like the California Department of State Hospitals, and Kent State University. Security watchdogs run out of adjectives (“11 out of 10”) in characterizing the magnitude of this hack.
At the same time, security experts cannot help admiring the sheer artistry of this exploit. Hackers themselves often view their codes as a work of art. According to one cybersecurity expert, “Programmers and hackers like to sign their work like artists…So they sign that code in various ways. Often, they’ll leave their initials or they’ll try to be cute and put some sort of cryptic message.” So how was this hack accomplished?
I am grateful to Yang Zhou for inviting me to talk about a working paper (with Gavin Roberts) on Friday. Yang told me that this audience is not familiar with lab experiments, so I’m going to take a few minutes out of my time to set the stage for my research.
There is a new book out, Causal Inference by Scott Cunningham, that is the talk of #EconTwitter (Cunningham, 2021). The book is 500 pages of dense prose and code. Here is a review saying that Cunningham left out many key things that a practitioner would need to know. Causal inference from naturally occurring data is hard!
Lab experiments bring something important to the research community. Lab experiments give the researcher a lot of control, which is why they are particularly useful for causal inference (Samek, 2019).
Who doesn’t want to be stronger? You can get on the floor and do 5 pushups right now. Did you do it? Probably not. (If you did, great work.) For most people, nothing is stopping you from getting strong, except yourself.
I just keep sitting around. Going to a gym and meeting with an instructor in person used to be a way around this problem. This takes our human foibles and makes them work to our advantage. The sunk cost fallacy can work for us.
If you bought a stock and it’s a loser, you should sell! Too many people keep holding and go down with the ship.
However, knowing themselves, many people also go to the gym and sign up for a class. Not wanting to walk away from their investment, they actually do the classes.
The WSJ reports that many gyms are closing after Covid-19 forced the customers out. The article describes the machines people have brought into their homes to replace gyms. The Peloton is a signature of the year 2020. The new trend brings a live human trainer into the process of exercising alone at home.
The new machines can collect data on the user. This data is transmitted to instructors and maybe even friends. Now, from the comfort of your own home, you can “sign up for a class” again.
Had Covid struck in 1980, people might have bought fitness machines for their basements and they might even have bought a VHS to pop in and exercise with. But they would have been missing the link to a human who knows where they are supposed to be, which apparently provides more motivation.
The market has loved Peloton and smart money seems to think it will continue to do well, even with a vaccine already rolling out.
I made a Fitbit account years ago, even though I don’t wear one. As a user, I got an email on Jan 14, 2021 alerting me that they just sold Fitbit to Google. The email assured me that Google will not try to muscle Fitbit users away from iPhones or iOS. Google has said that it will keep Fitbit data “separate from other Google ad data.” TechCrunch had some more details for me, including how many billions of dollars Fitbit was getting out of this deal.
Is it so bad to see adsbased on your sleep habits? What if you had a bad night and then saw more coffee ads the next day? Seems fine. Is it more “creepy” than seeing an ad for something you just bought?
I don’t actually know much about Google’s data structure. But I can imagine ways that a large tech company could use Fitbit data in a way that users would not like. What if Google knows that you didn’t sleep well this week. Say someone else is using Google search to find a person to recruit for a desirable job in Public Relations. What if predictive models indicate that people who don’t get at least 6.5 hours of sleep per night are low performers? What if you ended up not getting linked up with your dream job, because you weren’t sleeping well one week? This is all speculative. What if Google starts measure how your heart rate responds to viewing various website that you access through Chrome? Have they agreed to not do that as part of the acquisition deal?
In 2018, Tyler sat down with Eric Schmidt, a senior executive of Google. Tyler asked him why Google doesn’t use their massive stores of data to inform investments for a hedge fund. Here was the reply:
SCHMIDT: Well, I’ll give you a more generic answer, which is, from the moment I joined the company, there were many people who said, “Why don’t you take this information and do something that will use it for marketing purposes?”
And the answer is always the same, which is that you need people’s permission to do that, and you can be sure you won’t get that permission, if you follow that reasoning. So we decided that was a pretty bright line. For example, if a tech company that were a consumer company were bundled with a hedge fund, you would have to disclose that it was being used in that context. The people would go crazy.
But the other thing that’s true — and Google was good about this — is we took the position that it was important for us to disclose everything we were doing as well as we could.
I’ll give you a governance argument. In a large company, the employees are independent citizens of humanity, and if they see corruption in your leadership — in other words, if they see you doing things which are inconsistent with the values, you will be criticized.
Schmidt doesn’t deny that Google could take advantage of data in order to become a successful hedge fun. He says that it would look bad, and Google doesn’t want to look bad even to its own employees. Hmmm, right? I don’t bring this up to accuse Google of wrongdoing. It just makes you wonder how things will unfold in the future. One can, at least, see why the acquisition of Fitbit was scrutinized.
I use Google products heavily on my laptop. I don’t have many “smart” devices aside from my smartphone. I wore the Fitbit step tracker for a few days, but I didn’t find the information to be helpful. It’s not like the Fitbit does the dishes for me or drives me to the gym. Get me that smart device and I’ll look at any ads you want.
The U.S. economy as quantified by GDP has been sputtering along in slow growth mode for a number of years. It took a huge hit in 2020 due to covid shutdowns and has not nearly recovered. But stock prices have been rocketing upwards, and this past year is no exception. Markets took a cliff-dive in March, but have since way overshot to the upside.
Here is a plot of the past five decades of U.S. GDP and of the Wilshire 5000 index, which approximates the total stock market capitalization in the U.S.:
These two curves have crisscrossed each other over the past five decades, but in recent years the stock market has roared to the upside. One of Warren Buffet’s favorite metrics as to whether stock are overvalued is to consider the ratio of these two quantities, i.e. the market-capitalization-to-GDP (Cap/GDP) ratio:
Throughout 2020, I have tried to keep up with the most recent data, not only on officially coded COVID-19 deaths, but also on other measures. An important one is known as excess mortality, which is an attempt to measure the number of deaths in a year that are above the normal level. Defining “normal” is sometimes challenging, but looking at deaths for recent years, especially if nothing unusual was happening, is one way to define normal. The team at Our World in Data has a nice essay explaining the concept of excess mortality.
One thing to remember about death data is that it is often reported with a lag. The CDC does a good job of regularly posting death data as it is reported, but these numbers can be unfortunately deceptive. For example, while the CDC has some death data reported through 51 weeks of 2020, but they note that death data can be delayed for 1-8 weeks, and some states report slower than others (for reasons that are not totally clear to me, North Carolina seems to be way behind in reporting, with very little data reporting after August).
So there’s the caution. What can we do with this data? Since 2019 was a pretty “normal” year for deaths, we can compare the deaths in 2020 to the same weeks of data in 2019. In the chart at the right, I use the first 48 weeks of the year (through November), as this seems to be fairly complete data (but not 100% complete!). The red line in the chart shows excess deaths, the difference between 2019 and 2020 deaths. From this, we can see that there were over 357,000 excess deaths in 2020 in the first 11 months of the year, or about a 13.6% increase over the prior year.
Is 13.6% a large increase? In short, yes. It is very large. I’ll explain more below, but essentially this is the largest increase since the 1918 flu pandemic.
Forget “The Christmas Prince” or “The Prince Christmas” or whatever is on Netflix. Why not spend your holiday refreshing this new vaccine dashboard?
Here’s the announcement:
I personally know a few health care workers who got their shots (do not say “jab” to me) this past week. It’s all very exciting! Here at University of Alabama at Birmingham (UAB), the medical community has freezers, fortunately.
Here’s VP Mike Pence getting his vaccine:
Jeremy and Doug have both talked about allocation this week. Economists get really jazzed about allocating scarce resources. It’s been frustrating to watch first tests and now vaccines not be available on a market. Excellent points are also made every week over at Marginal Revolution on how we are missing an opportunity to get the incentives right. Supply. Curves. Slope. Up. (Thousands. Dying. Every. Week.)
I’m relatively new to Birmingham, Alabama. I was nervous about moving to a place with famously long hot humid summers. My intuition since moving here is that there are many days throughout the year when, at some point in the day, the weather is nice for doing something outside with my kids.
Yesterday, Sunday, was very nice. To have such a nice warm sunny day in mid-December is strange to me. I grew up further north where Decembers are chilly. Here is a picture of a neighbor’s son enjoying the summer-like feel of this technically-winter day. This picture was taken at noon.
Although I am grateful for this particular day, I also think about the hot summer days when noon is a time to hide indoors with air conditioning. Is it nice here? How can that question be answered scientifically?
This map confirmed my intuition. My old life in New Jersey was in the dark green zone, and my new life in Alabama is one level better, in terms of how many “nice” days you can expect in a year.
If you don’t have climate control, then you might be more worried about weather extremes. If you are lucky enough to have a regulated indoor environment, then a nice place to live is largely a question of how many days you get when it’s nice to “go out”.
This map accounts for “nice days”. I wonder if New Jersey would seem closer to Alabama if the measure changed to “nice daylight hours”. Yesterday was beautiful, but it was dark by 5pm. When I get time, I’m going to make a map of where in the lower 48 you can enjoy dinner outside after work many times per year (and why is it Southern California?).
As I said earlier, I used SAS Viya for Learners this semester. I assigned a final project for students. They had to use the data pre-loaded into the free version of SAS Viya, but otherwise had freedom to select their own variables and construct their own research question.
SAS Viya for Learners just recently opened for any users to make an account. This will allow you to learn SAS Viya functions (but not do your own actual work, because you cannot import new data). I’m using SAS Viya 3.4.
I like the way SAS Viya allows users to create a beautiful intuitive interactive decision tree model. This blog is to show you what that looks like. In traditional EconLit, regressions are more popular than decisions trees. Decision trees are a simple and useful machine learning technique. If you are trying to teach a first-timer about decision trees, then the visualization in SAS Viya for Learners can be helpful.
I’ll demonstrate using a decision tree for classification using built-in SAS data. One of the larger datasets available is USCENSUS1990. I’ll use it to demonstrate (and I do love the 90’s!). Consider the variable about the number of children a person has. This could be reasonably predicted by age and education level. [Footnote 1]
Here’s a chart showing the frequency of family sizes for adult women. (I used a Filter to only include people who are not coded zero in iFertil. See Footnote 1.)
For adult women in 1990, the most frequent category is to have more than 2 children. This would include the parents of Boomers. Think about those big families you know from the Boomer generation.
For input variables to my model, I’ll use age categories and also education levels. I set the new categorical variable I created called NumberChildren as the Response variable for a decision tree model. [Footnote 2] Here’s a zoomed out picture of the visual model output.
It’s immediately obvious that age is more informative than schooling. Women under the age of 30 are much more likely to have no children. The width of the grey tree branches makes it easy to see where the majority of the observations are.
I’ll zoom in on the left side of the tree where most of the people are.
The “>= 4.25” means that women on the far left side are over the age of 40. Among older women, the norm is to have 2 or more children. If you are looking for the older women with exactly two children, you are more likely to find them among those who have an education score of larger than “13”, meaning that they have a Bachelor’s degree or higher.
My point is not to posit causal relationships among education and fertility. My point is how awesome these graphs are. You do have to learn some point-and-click functions within SAS Viya to make them. But I don’t know of any other software that can produce this.
SAS Viya also provides tables and statistics on each node, which is more like what I could get from free open source software a few years ago when I looked into decision tree packages.
[Footnote 1] If you want to replicate what I did, know that the USCENSUS1990 dataset in SAS Viya comes with no explanation. Google brought me to UCI, where I found what I needed in terms of technical documentation.
dAge, iFertil, iSex, iYearsch are the names of the variables you will find in SAS. To create my graphs and models, I converted some of them to categorical variables using the “+New Data Item -> Custom Category” functions. No programming is required.
iSex: 0 indicates Male, 1 indicates Female
dAge is coded as follows: 0 is babies; 1 is under 13, 2 is under 20 (but over 13), 3 is under 30, 4 is under 40, 5 is under 50, 6 is under 65, 7 is for 65 and over
iFertil is coded: 0 is either less than 15 years old or male, 1 is no child, 2 means they have one child (confusing…), 3 means they have two children, all the way up to a 13 which is the code for 12 or more children
iYearsch: 3-10 refers to primary school up to a 10 indicating graduating from high school, 11-13 refer to some college and associate degrees, 14 is a Bachelors degree, 15-17 refers to higher degrees
[Footnote 2] I decided to set Maximum levels to 5 in Options. This keeps the tree smaller which looks better in the blog.