Most Improved Data

The US government is great at collecting data, but not so good at sharing it in easy-to-use ways. When people try to access these datasets they either get discouraged and give up, or spend hours getting the data into a usable form. One of the crazy things about this is all the duplicated effort- hundreds of people might end up spending hours cleaning the data in mostly the same way. Ideally the government would just post a better version of the data on their official page. But barring that, researchers and other “data heroes” can provide a huge public service by publicly posting datasets that they have already cleaned up- and some have done so.

I just added a data page to my website that highlights some of these “most improved datasets”:

  • the IPUMS versions of the American Community Survey, Current Population Survey, and Medical Expenditure Panel Survey
  • The County Business Patterns Database, harmonized by Fabian Eckert, Teresa C. Fort, Peter K. Schott, and Natalie J. Yang
  • Code for accessing the Quarterly Census of Employment and Wages by Gabriel Chodorow-Reich
  • The merged Statistics of US Business, my own attempt to contribute

I hope to keep adding to this page as I find other good sources of unofficial/improved data, and as I create them (one of my post-tenure goals). See the page for more detail on these datasets, and comment here if you know of existing improved datasets worth adding, or if you know of needlessly terrible datasets you think someone should clean up.

College Major, Marriage, and Children Update

In a May post I described a paper my student my student had written on how college majors predict the likelihood of being married and having children later in life.

Since then I joined the paper as a coauthor and rewrote it to send to academic journals. I’m now revising it to resubmit to a journal after referee comments. The best referee suggestion was to move our huge tables to an appendix and replace them with figures. I just figured out how to do this in Stata using coefplot, and wanted to share some of the results:

Points represent marginal effects of coefficient estimates from Logit regressions estimating the effect of college major on marriage rates relative to non-college-graduates. All regressions control for sex, race, ethnicity, age, and state of residence. MarriedControls additionally controls for personal income, family income, employment status, and number of children. Married (blue points) includes all adults, others include only 40-49 year-olds. Lines through points represent 95% confidence intervals.
Points represent coefficient estimates from Poisson regressions estimating the effect of college major on the number of children in the household relative to non-college-graduates. All regressions control for sex, race, ethnicity, age, and state of residence. ChildrenControls additionally controls for personal income, family income, employment status, and number of children. Children (blue points) includes all adults, others include only 40-49 year-olds. Lines through points represent 95% confidence intervals.

Many details have changed since Hannah’s original version, and a lot depends on the exact specification used. But 3 big points from the original paper still stand:

  1. Almost all majors are more likely to be married than non-college-graduates
  2. The association of college education with childbearing is more mixed than its almost-uniformly-positive association with marriage
  3. College education is far from uniform; differences between some majors are larger than the average difference between college graduates and non-graduates

Farewell to the First Normal Semester in 3 Years

Today as I gave my last final and took my kids to a huge school party, it struck me that things are finally back to something like 2019 levels of normality.

2020 was a lost cause, of course. I had high hopes for 2021 that vaccines would immediately get us back to normal. They did get my school back to fully in-person by Fall 2021, but not really back to normal, partly thanks to the variants. My students were out sick more than normal, and I was out watching my sick kids more than normal, as every cold meant they would be home until the school was sure it wasn’t Covid. Toward the end of the Spring 2022 semester worries were subsiding, and my state was pretty much fully re-opened, but things still weren’t really back to normal. Student attendance and effort were still way below normal, partly from the lingering effects of Covid, and partly from celebrating its end- partying to make up for lost time (and cheering on a great basketball team).

Fall 2022 finally felt like a basically normal semester. I still see the occasional mask, still hear from the occasional student out with Covid, and still have one kid missing 2 school days with every cough (policies stricter than 2019, but much relaxed from the days when both kids were at schools that could have them miss 5+ days with every non-Covid cough). Overall though student attendance and effort are back to what seem like normal levels. Up to Spring 22 I’d have students just disappear for a few weeks, not in class, not answering e-mails about why they weren’t showing up or completing work, needing lots of help to get on track once they finally reappeared. This Fall that didn’t happen; in my Senior Capstone everyone turned in a quality paper basically on-time and without me having to chase anyone down for it. Also, everyone just seemed happier now that their stress levels are back down to the baseline for college students.

This semester was nothing special- and that was beautiful.

Sympathy for the Sauds

I’ve always been confused by the US alliance with Saudi Arabia. Its a state with values abhorrent to many Americans, and it seems like we don’t get much practical value out of the alliance.

This essay on Saudi history, politics, and economics by Matt Lakeman makes the situation more comprehensible. I still don’t know that I want the alliance, but I can now see how so many US presidents have continued with it without necessarily being stupid, crazy, or corrupt. In short, they think that most of the realistic alternatives are worse. Some highlights:

Before starting this research, I had the same perception as Wood that the Saudi economy is essentially what he calls a “petrol-rentier state.” Basically, Saudi Arabia sits on top of a giant ocean of easily-accessed oil which they suck out of the ground and sell at enormous profit to prop up the rest of their extremely inefficient economy and buy the loyalty of their own people and foreign powers. Saudi Arabia is the wealthiest large state in the Middle East today by sheer virtue of geographic luck rather than any innovation or business acumen on the part of its people.

And after doing my research, all of the above is… basically true.

But all of that should also be true of Iran, Iraq, Venezuela, Libya, and a few other countries which are also situated on giant oceans of oil but are far poorer than Saudi Arabia.

Economically, Saudi Arabia deserves little credit for its success. Politically, Saudi Arabia deserves a tremendous amount of credit for enabling its economic success. 

Dealing with the resource curse is always challenging, and foreign ownership is an additional challenge. How did they manage it?

the Sauds struck a clever balance between being too aggressive and too placating of the foreigners operating their oil wells. If the Saudi state had been aggressive and tried to nationalize its oil quickly, Saudi Arabia could have ended up becoming another Venezuela or Iran with lots of external political pressure from hostile Western countries and a low-efficiency oil industry. But if they had nationalized too late, they would have ended up like a lot of African nations who have all their natural wealth siphoned away by foreigners.

Instead, the Sauds executed a patient, and most importantly, amicable assertion of power over Aramco, which did not become fully owned by Saudis until 1974. At the very start of Aramco, the company was entirely owned and operated by Americans aside from menial labor. However, the Saudi government inserted a clause into their contract with the corporation requiring the American oil men to train Saudi citizens for management and engineering jobs. The Americans held up their end of the bargain, and over time, more and more Saudis took over management and technical positions.

In addition to carefully negotiating the balance of power with various foreigners, the Sauds have done so with the religious establishment:

Though the monarch has absolute power, his authority is at least in part derived from Saudi Arabia’s Islamic religious establishment. The ulema (a group of the highest-ranking clerics) is officially integrated into the government, and plays an important role in legal matters. However, the religious establishment has slowly been marginalized by the monarchy over the last few decades, and has possibly been subjugated entirely since the reform era began five years ago.

Winning freedom of action has been a long road with many setbacks:

[King] Abdulaziz constantly had to reassure enraged Wahhabi clerics that he wasn’t selling out the Arab homeland to treacherous infidels. IIRC, it was some time in the 1920s that Abdulaziz had to publicly smash a telegraph to prove to the clerics that he wasn’t bewitched by infidel technology.

In late 1979, 400-500 extremist Sunni Saudis seized the Grand Mosque in Mecca (the holiest Islamic site on earth) and demanded the overthrow of the Saud dynasty in favor of a theocratic state meant to await an imminent apocalypse. They held on for two weeks while managing to fight off waves of Saudi police and military squads. Eventually, three French commandos flew to Mecca, converted to Islam in a hotel room, and led a successful assault to retake the Mosque. Over 100 men died on each side, with hundreds more wounded.

The Grand Mosque seizure was the final wake-up call for the Saud dynasty. Something drastic had to be done or their regime would likely be ground down under mounting internal and external pressure…. King Khalid led a social/religious/political reactionary revolution within Saudi Arabia to align with the Sunni extremists. Up until about four years ago, Saudi society was still gender segregated and enforced a largely literalist interpretation of Sharia, hence the array of bizarre and antiquated laws – gender segregation in public, requiring women to cover their faces, outlawing of non-Muslim religious buildings (there are a few Shia mosques), restrictions on foreign media, etc. Saudi Arabia was always conservative, but most of these draconian laws were only put into place in the 1980s. The Saud dynasty purposefully induced a reactionary legal regime and pulled Saudi Arabia further away from liberalism.

The charitable take on making an already oppressive regime even more oppressive is that the Sauds were trying to bend Saudi Arabia to the extremists so the country would not break. And by all accounts, it worked; the conservative Wahhabi clerics backed by the Saud dynasty placated a sizeable portion of the Sunni extremists inside and outside of Saudi Arabia, and they became a pool of support against the Shia and Baathists. Saudi Arabia was certainly made a worse country for its citizens, but that was the price to pay for averting civil war.

More recently, Crown Prince Salman has consolidated power to the point where he can make modernizing reforms that Wahhabis might have opposed, like allowing women to drive, allowing non-Muslim foreigners to to get tourist visas, allowing music concerts, et c. Lakeman obviously likes these reforms, but at the same time worries that the concentrated power that so far Salman has largely used to enact positive reforms could be abused going forward, and on a larger scale than murdering the occasional dissident.

Wood argues that a worst case scenario parallel to MBS is Syrian Dictator Bashar al-Assad. Like MBS, there were high hopes that Assad would be a liberal reformer when he took over Syria. After all, Assad had been living and working in the UK as an ophthalmologist with no political aspirations, and was known to be a fan of Phil Collins. He was called to the throne after the unexpected death of his older brother, and so the West hoped that this nerdy British doctor would bring upper-middle class liberal values to Syria. Instead, Assad became one of the worst dictators of the modern Middle East, probably second only to Saddam Hussein.

I recommend reading the whole thing, here I’m quoting relatively small parts of an article full of interesting detail on the history, economics, and politics of Saudi Arabia. There’s also a section on visiting:

The silver lining to Saudi Arabia’s lack of tourism is that there aren’t many tourist restrictions. I went to two ancient settlements and I found no guards, no gates, no notices at all. I walked in, around, and on top of 2,000 year old houses, and I honestly have no idea if I was allowed to.

Ban, Subsidize, Mandate: Ethics and US Healthcare Policy

Tomorrow (Friday 12/2) I’ll be speaking at the Fall Ethics Forum at Sacramento State. The Center for Practical and Professional Ethics there does a forum every year on a different field of practical ethics, and this year they chose healthcare (some previous iterations look quite interesting, like Bryan Caplan on education and Lyman Stone on population). The event is open to the public if you happen to live near Sacramento, and I hope to be able to post a recording later. But for now, here’s a short preview of what I plan to say:

In many key respects, US health policy is about restricting the choices available to patients and health care providers: banning things the government doesn’t want, while mandating or subsidizing things they want. These restrictions on autonomy are typically justified by the idea that they lead to superior health or economic outcomes. In some cases this tradeoff between freedom and efficient utilitarian outcomes is real, but I highlight some policies such as Certificate of Need laws that appear to harm both freedom and efficiency. I argue that the overarching US approach to health policy is to subsidize demand while restricting supply, which together lead to exceptionally high prices but mediocre health outcomes.

I’ll also take on some classic questions like: when are free lunches truly free? And when is moral hazard really immoral?

New Data: State Regulatory Procedures

Released this April, but I just heard about it today. Researchers did the painstaking work of going through all 50 states to determine which steps must be taken in each state before new regulations can take effect. For instance, it turns out half of states require economic analysis for new regulations, and half don’t. The paper is here: https://www.mercatus.org/publications/regulation/50-state-review-regulatory-procedures

Two Types of News: Elections vs Crashes

Some events are like elections: it was obvious that some big political news would break on Election Day, we just had to wait to find out what exactly would happen. Others are like market crashes: you might know in principle they’re a thing that can happen, but you don’t really expect any particular day to be the day one happens, so they seem to come out of the blue. As it turns out, for one of the largest crypto exchanges the day of the crash also happened to be Election Day.

FTX.com is facing a bank run sparked by competitor Binance tanking the price of the token that backed some of their assets. Customers are having issues withdrawing their money, Binance has withdrawn its offer to bail out FTX by taking them over, and bankruptcy seems likely. Supposedly this doesn’t affect Americans using FTX US, but I’d be nervous about any funds I had there, or indeed with funds in any centralized crypto exchange or stablecoin (Tether and even USDC seem to be having issues holding their pegs). All this was especially shocking because many considered FTX founder Sam Bankman-Fried one of the most trustworthy people in the often sketchy world of crypto. He was always meeting with US regulators and lawmakers, and seems not to be motivated by greed; he had already begun to give away his fortune at scale.

After any surprising event like this, some people claim it was actually obvious and they saw it coming (despite usually never having said so beforehand), while others start looking back for warning signs they missed. The most interesting one is something that shocked me when I first heard it March, but I never considered the risk it implied for FTX until the crash:

Going forward, red flags to watch out for seem to be topping a list of youngest billionaires (as Elizabeth Holmes also did) and buying naming rights to a stadium.

In contrast to this crash, the election happened right when we all expected, and at least largely how I expected. Like markets, I underestimated Democrats a bit; polls overall were impressively accurate this year, though they of course missed on some particular races. Votes are still being counted, and as of now we don’t even know for sure which party will control Congress (PredictIt currently gives Democrats a 90% chance in the Senate and a 20% chance in the House). But here are some early attempts to assess forecast accuracy. As I said, some polls were quite good:

Some polls weren’t so good, which means its important to weight better pollsters more heavily when you aggregate them. Some attempts at that were also quite good:

Oddly, some no money (Metaculus) / play money (Manifold Markets) forecasting sites seem to have done better than the real-money prediction sites:

A Dragonfly’s View of Election Day 2022

This is my last post before the US midterm elections on Tuesday, so I’ll leave you with a prediction for what’s coming.

Who is the best predictor of elections? Nate Silver at FiveThirtyEight has had a pretty good run since 2008 using weighted polls. Ray Fair, an economics professor at Yale has a venerable and well-credentialed model based on fundamentals. I typically favor prediction markets, because they incorporate a wide range of views weighted by how willing people are to put their money where their mouth is, and traders are able to incorporate other sources of information (including predictors like FiveThirtyEight). But which prediction market should we trust? There are now many large prediction markets, and the odds often differ substantially between them.

When there are many reasonable ways of answering a question or looking at a problem, it can be hard to choose which is best. Often the best answer is not to choose- instead, take all the reasonable answers and average them. Dan Gardner and Philip Tetlock call this approach Dragonfly Eye forecasting, since dragonfly’s eyes see through many lenses. So what does the dragonfly see here?

Lets start with the US House, since everyone covers it.

  • FiveThirtyEight’s latest forecast shows that Republicans have an 85% chance of taking the House; it shows a range of possible outcomes, but on average predicts that Republicans win the popular vote by 4.3% and take 231 House seats (substantially over the 218 needed for a majority)
  • The Fair Model predicts that Democrats will win 46.6% of the two-party vote share (leaving Republicans with 53.4%). This has Republicans winning the popular vote by 6.8%, a moderately bigger margin than FiveThirtyEight. The reasoning is interesting; the economy is roughly neutral since “the negative inflation effect almost exactly offsets the positive output effect”, so this is mainly from the typical negative effect of having an incumbent party in the White House.
  • Prediction markets: PredictIt currently gives Republicans a 90% chance to take the House. Polymarket gives them 87%. Insight Prediction also gives them 87%. Kalshi doesn’t have a standard market on this, but their contest (free to enter, 100k prize) predicts 232 Republican seats.

Its a bit tricky to average all these since they don’t all report on the same outcome in the same way. But the overall picture is clear: Republicans are likely to do well in the House, with an ~87% chance to win a majority, expected to win the popular vote by ~5.55% and take ~232 seats.

The Senate is closer to a coin flip and harder to evaluate.

  • FiveThirtyEight gives Republicans a 53% chance to win a majority (51+ seats for them; Democrats effectively win if the Senate stays 50-50 since a Democratic Vice President breaks ties for at least 2 more years). The most likely seat counts are 50-50 or 51-49, but confidence intervals are pretty wide and 54-46 either direction isn’t ruled out.
  • The Fair Model doesn’t make Senate predictions, only House and Presidential predictions.
  • Prediction markets: PredictIt gives Republicans a 70% chance to win a Senate majority, probably with 52-54 seats. PolyMarket gives Republicans a 65% chance, as does Insight Prediction. Kalshi predicts 53 Republican seats.

Overall we see a much higher variance of predictions in the Senate; a 17pp gap between the highest (70%) and lowest (53%) estimates of Republican chances, vs just a 5pp gap for the House (90% to 85%). This shows up with the seat counts too; everyone agrees there’s a substantial chance Republicans lose the Senate, but if they do win, it will probably be by more than one seat. The average estimate is ~52 Republican seats. FiveThirtyEight and PredictIt agree that the closest Senate races will be Georgia, Pennsylvania, Arizona, Nevada, and New Hampshire (though they rank order them differently), so those are the races to watch.

Forecasts for governors aren’t as comprehensive, but FiveThirtyEight predicts we’ll get about 28 Republican (22 Democratic) governors, while PredictIt expects 31+ Republicans; I’ll split the difference at 30. Everyone agrees that Oregon is surprisingly competitive because of an independent drawing Democratic votes. The biggest difference I see is on New York, where PredictIt gives Republican challenger Lee Zeldin a real chance (26%) but FiveThirtyEight doesn’t (3%).

Overall forecast: moderate red wave, Republicans take the House and most governorships, probably the Senate too. But if they lose anything it is almost certainly the Senate.

These forecasts seem about right to me. Democrats are weighed down by an unpopular (-11) President and the highest inflation in 40 years. This would lead to a huge red wave, but Republicans have their own weaknesses; an unpopular former President lurking in the background, and the Supreme Court making a big unpopular change voters blame them for. This shrinks the red wave, but I don’t think its enough to eliminate it. The effect of Roe repeal is fading with time, and the unpopular Biden is more salient than the unpopular Trump; Biden is the one in office and is more prominent in media coverage. Facebook and recently-acquired Twitter may be doing Republicans a favor by keeping Trump banned through Election day. But if he drags Republicans down anywhere, it will be the Senate, where candidate quality (not just party affiliation) is crucial and his endorsements pushed some weak/weird/extreme candidates through primaries. We’ll also see this “extremist” Trump effect (abetted by cynical Democratic donations to extreme-right candidates) dragging down Republicans in some key governor’s races like Pennsylvania, where Democrats are now 90/10 favorites..

Bounce Houses are Surprisingly Cheap

Last year was the first time I saw a family that owned their own bounce house and just set it up in their living room. At the time I thought, what a lucky rich kid, that must cost at least a thousand dollars. But my wife looked into it and found out that bounce houses are surprisingly cheap these days. She got our kids this one last Christmas, its currently going for $234 on Amazon:

The kids love it and its still going strong ten months later, despite substantial use from kids and the presence of two sharp-clawed cats. It was certainly a bigger hit than the other major gift we tried last Christmas- telescopes are surprisingly hard to use.

Should Virologists Regulate Themselves?

Last Friday a group of researchers mostly from Boston University posted a paper which revealed they had created a new chimeric coronavirus and used it to infect mice.

We generated chimeric recombinant SARS-CoV-2 encoding the S gene of Omicron in the backbone of an ancestral SARS-CoV-2 isolate and compared this virus with the naturally circulating Omicron variant. The Omicron S-bearing virus robustly escapes vaccine-induced humoral immunity, mainly due to mutations in the receptor-binding motif (RBM), yet unlike naturally occurring Omicron, efficiently replicates in cell lines and primary-like distal lung cells. In K18-hACE2 mice, while Omicron causes mild, non-fatal infection, the Omicron S-carrying virus inflicts severe disease with a mortality rate of 80%. 

Many people who heard about this expressed concern that the risk of creating more contagious and/or deadly versions of Covid that could escape from a lab outweigh any potential benefits of what we could learn from this research.

Several researchers have responded to these concerns with variants of “trust virologists to weigh the risks here, they know more than you.”

Don’t tell us virologists how to do our jobs; tell farmers, hunters, and veterinarians how to do theirs

Here’s the thing: the virologists do know the risks better than the public or potential regulators- but they also have different incentives. What I want to point out today is that virology isn’t special; this is true of just about every field. A nuclear engineer knows much more about what’s happening at their plant than voters do, or distant bureaucrats at the Nuclear Regulatory Commission. Should we leave it to the engineers on site to decide how much risk to take? Should federal regulators leave it to the financial experts at Bear Sterns and AIG to decide how much risk they can take?

To some extent I actually sympathize with these critiques; industry practitioners really do tend to have the best information, and voters often push regulatory agencies to be insanely risk-averse. With any profession this information problem is a reason to regulate less than you otherwise would, and/or pay to hire expert regulators.

But externalities are real- the practitioners who have the best information use it to promote their own interests, which tend to differ from the interests of the public. In finance this means moral hazard at best and fraud at worst (who are you to say Bernie Madoff is a fraud? You know more about finance than him?). In medicine it means doctors who get paid more for doing more; they gave the guy who invented lobotomies a Nobel Prize in Medicine. In research that involves creating new viruses, researchers get the private benefits of prestige publications for themselves, but the increased pandemic risk is shared with the whole world. In this case its not just outsiders who are concerned, some subject-matter experts are too (and not just “usual suspects” Alina Chan and Richard Ebright; see also Marc Lipsitch).

The main current check on research like this is supposed to be Institutional Review Boards. The chimeric Covid paper notes “All procedures were performed in a biosafety level 3 (BSL3) facility at the National Emerging Infectious Diseases Laboratories of the Boston University using biosafety protocols approved by the institutional biosafety committee (IBC)”. But there are many problems with this approach. The IRB is run by employees of the same institution as the researcher, the institution that also claims a disproportionate share of the benefits of the research.

IRBs are also incredibly opaque. The paper claims it was approved by Boston University’s institutional biosafety committee, but these committees don’t maintain public lists of approved projects; I e-mailed them Sunday to ask if they actually approved this project and they have yet to respond. There is also no public list of the members of these committees, although in BU’s case you can get a good idea of who they are by reading the meeting minutes. This chimeric Covid proposal appears to have been reviewed as the second proposal of their January 2022 meeting, reviewed by Robert Davey and Shannon Benjamin and approved by a 16-0 vote of the committee. During the January meeting the committee approved all 6 projects they considered unanimously, after hearing 6 reports of lab workers at BU being exposed to lab pathogens in the previous month, e.g.:

MD/PhD student reported experiencing low grade temperatures and other symptoms after he accidentally injured his thumb percutaneously on 12-6-21 while cleaning forceps that he had used to remove infected lungs from mice injected with NL63 virus

IRBs are supposed to protect research subjects from harm, but in practice largely serve to protect their institutions from lawsuits and PR disasters (part of why they’re often too strict). The fact that this did get institutional approval provides one silver lining here; if this chimeric Covid ever did escape and cause an outbreak, those infected by it could potentially sue for damages not only the individual researchers, but Boston University and its $3.4 billion endowment. Being able to internalize externalities in this way is one of many good reasons to be testing those infected with Covid to see what variant they have.

I think we should at least consider stronger national regulations against research like this, rather than leaving each decision to local institutional review boards (ask any researcher how much they trust IRBs). At the very least we should stop subsidizing it; NIH claims they don’t fund “gain of function” research like this, but the researchers who made a new version of Covid conclude their paper:

This work was supported by Boston University startup funds (to MS and FD), National Institutes of Health, NIAID grants R01 AI159945 (to SB and MS) and R37 AI087846 (to MUG), NIH SIG grants S10- OD026983 and SS10-OD030269 (to NAC)