What is truth? The Bayesian Dawid-Skene Method

I just learned about the Bayesian Dawid-Skene method. This is a summary.

Some things are confidently measurable. Other things are harder to perceive or interpret. An expert researcher might think that they know an answer. But there are two big challenges: 1) The researcher is human and can err & 2) the researcher is finite with limited time and resources. Even artificial intelligence has imperfect perception and reason. What do we do?

A perfectly sensible answer is to ask someone else what they think. They might make a mistake too. But if their answer is formed independently, then we can hopefully get closer to the truth with enough iterations. Of course, nothing is perfectly independent. We all share the same globe, and often the same culture or language. So, we might end up with biased answer. We can try to correct for bias once we have an answer, so accepting the bias in the first place is a good place to start.  

The Bayesian Dawid-Skene (henceforth DS) method helps to aggregate opinions and find the truth of a matter given very weak assumptions ex ante. Here I’ll provide an example of how the method works.

Let’s start with a very simple question, one that requires very little thought and logic. It may require some context and social awareness, but that’s hard to avoid. Say that we have a list of n=100 images. Each image has one of two words written on it, “pass” and “fail”. If typed, then there is little room for ambiguity. Typed language is relatively clear even when the image is substantially corrupted. But these words are written, maybe with a variety of pens, by a variety of hands, and were stored under a variety of conditions. Therefore, we might be a little less trusting of what a computer would spit out by using optical character recognition (OCR). Given our own potential for errors and limited time, we might lean on some other people to help interpret the scripts.

Continue reading

The Middle/Working Class Has Not Been “Hollowed Out”

Claims that the middle class or working class has been “hollowed out” in the US have been made for years, or decades really. The latest claim is an essay in the Free Press by Joe Nocera. But these claims are usually lacking in data, while strong in anecdotes. Let’s look at the data.

One data point we might use is median weekly earnings for full-time workers with a high school diploma, but no college degree. That sounds like a reasonable definition of “working class.” Here’s what that data looks like adjusted for inflation with the PCE Price Index:

Notice that the latest data point is for 2024, which is the highest they have ever been in this data series, and likely higher than any point in the past. While many point to about the year 2000 as when troubles for the working class started (this is when manufacturing employment really fell off a cliff, and China joined the WTO in 2001), inflation-adjusted earnings have risen 11% for this group of workers since then. You might say that’s not a lot of growth — and you would be correct! But this group is better off economically than in the year 2000, which is a point that gets lost in so many discussions about this issue.

But that’s just a national number. Might some states that were especially hit by manufacturing job losses be worse off? Nocera mentions North Carolina and the Midwest. To answer this, we can use BLS OEWS data, which has not only median wages by state, but also the 10th percentile wage — the lowest of the working class. Here’s what median real wage growth (again inflation-adjusted with the PCEPI) since 2001 (the earliest year in this series with comparable data):

Continue reading

An Egg-cellent Consumer Surplus Calculation?

There was a recent Planet Money Podcast episode that includes a fun exercise. An NPR employee produces a dozen chicken eggs and wants to sell them at cost to another employee for $5. That’s the setup. How does the employee decide who should receive the eggs? Clearly, the price mechanism won’t work since the price is fixed. A lottery is also not allowed. The egg recipient could engage in arbitrage, reselling the eggs for a higher price. But that’s not very likely and would be socially awkward. The egg producer wants to make someone happy. Who would he make the happiest?

That’s the challenge that the Planet Money team tries to solve.

First, they started with a survey. Rather than asking coworkers to rank a long list of things that includes eggs, the survey adopts a more robust method of pairwise comparisons. Do you prefer toast vs eggs? Eggs vs oatmeal? Toast vs oatmeal? and so on. One problem that they encounter, however, is that there is a lot of diversity among preparations methods. My oatmeal is better than my eggs. But my brother’s oatmeal is not. As it turns out, there is not a standard quality of prepared oatmeal and prepared eggs. So the survey is a flop.

Then they consult an economist. They decide to try to measure “willingness to pay”, which is an economic concept that identifies the maximum that a person could pay for something without becoming worse off. They couldn’t really ask the coworkers what their WTP is. People are social creatures and have many reasons to lie, mislead, signal, and to simply not know. Since someone’s WTP reflects preferences and values, we need a way to solicit the true preference while avoiding lies and most mistakes. Here’s how the economist suggested that they reveal the coworker preferences.

  • Step 1: Tell the coworker these rules.
  • Step 2: Coworker reports their WTP for a single egg in dollars
  • Step 3: A random price will be chosen by a machine. If the price is above the self-reported WTP, the coworker is not allowed to buy the egg. If the price is below the WTP, then the coworker must buy the egg at the random price.

The idea is as follows.

Continue reading

Kaggle Wins for Data Sharing

I like to take existing datasets, clean them up, and share them in easier to use formats. When I started doing this back in 2022, my strategy was to host the datasets with the Open Science Foundation and share the links here and on my personal website.

OSF is great for allowing large uploads and complex projects, but not great for discovery. I saw several of my students struggle to navigate their pages to find the appropriate data files, and they seem to have poor SEO. Their analytics show that my data files there get few views, and most of the ones they get come from people who were already on the OSF site.

This year I decided to upload my new projects like County Demographics data to Kaggle.com in addition to OSF, and so far Kaggle is the clear winner. My datasets are getting more downloads on Kaggle than views on OSF. I’ve noticed that Kaggle pages tend to rank highly on Google and especially on Google Dataset Search. I think Kaggle also gets more internal referrals, since they host popular machine learning competitions.

Kaggle has its own problems of course, like one of its prominent download buttons only downloading the first 10 columns for CSV or XLSX files by default. But it is the best tool I have found so far for getting datasets in the hands of people who will find them useful. Let me know if you’ve found a better one.

GDP Forecasting: Models, Experts, or Markets?

The 2025 first quarter GDP data came in slightly bad: negative 0.3%. I think the number is a bit hard to interpret right now, but it’s hard to spin away a negative number. A big factor pulling down the accounting identify that we call GDP was a massive increase in imports, specifically imports of goods. It’s likely this is businesses trying to front-run the potential tariffs (and keep in mind this was pre-“Liberation Day,” so probably even more front running in April), so the long-run effect is harder to judge.

But aside from the interpretation of the GDP estimate, we can ask a related question: did anyone predict it correctly? I have written previously about two GDP forecasts from two different regional Federal Reserve banks. They were showing very different estimates for GDP!

Both the Fed estimates ending up being pretty wrong: -1.5% and +2.6%. But there are two other kinds of forecasts we can look at.

The first is from a survey of economists done by the Wall Street Journal. The median forecast in that survey was positive 0.4%. This survey got the direction wrong, but it was much closer than the Fed models.

Finally, we can look at prediction markets. There are many such markets, but I’ll use Kalshi, because it’s now legal to use in the US, and it’s pretty easy to access their historical data. The average Kalshi forecast for Q1 (a weighted average of sorts across several different predictions) was -0.6%. Pretty close! They got the direction right, and the absolute error was smaller than WSJ survey. And obviously, much better this quarter than the Fed models.

But this was just one quarter, and perhaps a particularly weird quarter to predict (Atlanta Fed even had to update their model mid-quarter, because large gold inflows were throwing of the model). You may say that weird quarters are exactly when we want these models to perform well! But it’s also useful to look at past predictions. The table below summarizes predictions for the past 9 quarters (as far back as the current NY Fed model goes):

Continue reading

95 Days of Trump Spending & Cutting

Generally, decisions to spend federal funds come is the authority of congress. But the Trump administration has very publicly made clear that it will try to cut the things that are within its authority (or that it thinks should be within that authority). Truly, the fiscal year with the new Republican unified government won’t begin until October of 2025. So, the last quarter is when we’ll see what the Republicans actually want – for better or for worse. In the meantime, we can look past the hyperbole and see what the accounting records say. The most recent data includes 95 days after inauguration.  First, for context, total spending is up $134 billion or 5.8% from this time last year to $2.45 trillion.

The Trump administration has been making news about their desire and success in cutting. Which programs have been cut the most? As a proportion of their budgets, below is a graph of were the five biggest cuts have happened by percent. The Cuts to the FCC and CPB reflect long partisan stances by Republicans. The cuts to the Federal Financing Bank reflect fewer loans administered by the US government and reflect the current bouts to cut spending. Cuts in the RRB- Misc refer to some types of railroad payments to employees. In the spirit of whiplash, the cuts to the US International Development Finance Corporation reverse the course set by the first Trump administration. This government corporation exists to facilitate US investment in strategically important foreign countries.

But some programs have *increased* spending since 2024. The five largest increases include the USDA, the US contributions to multilateral assistance, claims and judgments against the US, the federal railroad administration, and the international monetary fund. Funding for farmers and railroads reflect the old agricultural and new union Republican constituencies. The multilateral assistance and IMF spending reflects greater international involvement of the administration, despite its autarkic lip service.

Continue reading

Is Every Stock a Tariff Stock?

Not quite, at least not in the same way that every stock was a vaccine stock in 2020, as Alex Tabarrok put it.

Today the stock market does seem to move a lot on the news about Trump’s ever-evolving tariff policy. If you see the S&P 500 is up today, you can probably guess that Trump or his advisors slightly backed off some aspect of their previously announced tariff policy. And vice versa. That much is true.

But back in 2020, the implied correlation in the market was briefly over 80% in the spring of 2020, and was over 50% for almost all of the summer of 2020. Today, the correlation is closer to 40%. That’s a bit lower than 2020, but it is a significant jump from where it was 2-3 months ago.

Here is the Cboe’s implied 3-month correlation index:

In addition to the costs of tariffs themselves, investors should be worried about this correlation because “market returns are lower when correlations among assets are increasing.”

It’s the Humidity

Recently, I learned what humidity is. That might sound stupid, so let me clarify. I knew that humidity is the water content of the air. I also knew that the higher the number, the more humid. Finally, I also knew that the dew point is the temperature at which the water falls out of the air. But, now I understand all of this in a way that I hadn’t previously.

First, what does it mean for there to be 70% humidity? As it turns out, it’s a moving target. There are two types of humidity: specific and relative. Specific humidity is the mass of water in, say, a kilogram of air. So, more humidity means more water. This is obvious. There’s a related concept called absolute humidity, which is more like mass of water per volume of air (sometimes used in place of specific humidity). Again, more humidity means more water. Neither of these is the way that humidity is reported on the weather channel.

Relative humidity is the number that you see in your weather app. What’s that? Relative to what? First, we need to know that warm air can hold more water than cool air. Pressure also matters, but atmospheric pressure doesn’t change enough to make its effect on humidity significant on relevant margins. So, all of this discussion, and the number in your phone, is at atmospheric pressure. Below is a graph that illustrates the maximum amount of water that can be in the air at different temperatures (red line). So, at 30 degrees Celsius (86 degrees Fahrenheit), there can be as much as 27 grams (0.95 oz or ~2 tablespoons) of water in the air.

More after the jump.

Continue reading

GDPNow: Still Negative on Q1, But Less So

Last month I wrote about the projected decline in GDP from the Atlanta Fed’s GDPNow model. Since then, they have released an alternative version of the model, which includes a “gold adjustment” to account for non-monetary gold inflows, which may be impacting the model to overstate the negative impact of imports (and it looks like this may be a permanent change to the model).

With those changes, and some more recent data, the GDPNow model is still pointing to a negative reading for Q1 of 2025, though only very slightly now: -0.1%.

It’s also worth noting that the New York Fed has a similar model, but one with very different estimates right now: about 2.6% for Q1.

We’ll still have to wait until April 30th to get the preliminary estimates from BEA.

Now published: Human capital of the US deaf Population, 1850-1910

Myself and a student coauthor worked hard on our article that is now published in Social Science History. It’s the first modern statistical analysis of the historical deaf population. We bring an economic lens and statistical treatment to a topic that previously included much anecdotal evidence and case study. We hope that future authors can improve on our work in ways that meet and surpass the quantitative methods that we employed.

Our contributions include:

  • A human capital model of deafness that’s agnostic about its productivity implications and treats deaf individuals as if they made decisions rationally.
  • A better understanding of school attendance rates and the ages at which they attended.
  • Deaf children were much more likely to be neither in school nor employed earlier in US history.
  • The negative impact of state ‘school for the deaf’ availability on subsequent economic outcomes among deaf adults. We speculate that they attended schools due to the social benefits of access to community.
  • Deaf workers did not avoid occupations where their deafness would be incidentally detectable by trade partners, implying that animus discrimination was not systemically important for economic outcomes.
Continue reading