nature – Economist Writing Every Day

How Much To Trust Research Papers? My Rules Of Thumb

April 2, 2026James Bailey1 Comment

Trust literatures over single papers
Common sense and Bayes’ Rule agree: extraordinary claims require extraordinary evidence
Trust more when papers publicly share their data and code
Trust higher-ranked journals more up to the level of top subfields (e.g. Journal of Health Economics, Journal of Labor Economics), but top general-interest journals can be prone to relaxing standards for sensationalist or ideologically favored claims (e.g. The Lancet, PNAS, Science/Nature when covering social science)
More recent is better for empirical papers, data and methods have tended to improve with time
Overall effects are more trustworthy than interaction or subgroup effects, the latter two are easier to p-hack and necessarily have lower statistical power
Trust large experiments most, then quasi-experiments, then small experiments, then traditional regression (add some controls and hope for the best)
The real effect size is half what the paper claims

That last is inspired by a special issue of Nature out today on the replicability of social science research. An exception to rule #4, this is an excellent project I will write more about soon.

Humanity’s Last Exam in Nature

February 26, 2026February 25, 2026James BaileyLeave a comment

Last July I wrote here about “Humanity’s Last Exam”:

When every frontier AI model can pass your tests, how do you figure out which model is best? You write a harder test.

That was the idea behind Humanity’s Last Exam, an effort by Scale AI and the Center for AI Safety to develop a large database of PhD-level questions that the best AI models still get wrong.

The group initially released an arXiV working paper explaining how we created the dataset. I was surprised to see a version of that paper published in Nature this year, with the title changed to the more generic “A benchmark of expert-level academic questions to assess AI capabilities.”

One the one hand, it makes sense that the core author groups at the Center for AI Safety and Scale AI didn’t keep every coauthor in the loop, given that there were hundreds of us. On the other hand, I’m part of a different academic mega-project that currently is keeping hundreds of coauthors in the loop as it works its way through Nature. On the third, invisible hand, I’m never going to complain if any of my coauthors gets something of ours published in Nature when I’d assumed it would remain a permanent working paper.

AI is now getting close to passing the test:

What do we do when it can answer all the questions we already know the answer to? We start asking it questions we don’t know the answer to. How do you cure cancer? What is the answer to life, the universe, and everything? When will Jesus return, and how long until a million people are convinced he’s returned as an AI? Where is Ayatollah Khamenei right now?

It’s the Humidity

April 18, 2025April 19, 2025Zachary Bartsch1 Comment

Recently, I learned what humidity is. That might sound stupid, so let me clarify. I knew that humidity is the water content of the air. I also knew that the higher the number, the more humid. Finally, I also knew that the dew point is the temperature at which the water falls out of the air. But, now I understand all of this in a way that I hadn’t previously.

First, what does it mean for there to be 70% humidity? As it turns out, it’s a moving target. There are two types of humidity: specific and relative. Specific humidity is the mass of water in, say, a kilogram of air. So, more humidity means more water. This is obvious. There’s a related concept called absolute humidity, which is more like mass of water per volume of air (sometimes used in place of specific humidity). Again, more humidity means more water. Neither of these is the way that humidity is reported on the weather channel.

Relative humidity is the number that you see in your weather app. What’s that? Relative to what? First, we need to know that warm air can hold more water than cool air. Pressure also matters, but atmospheric pressure doesn’t change enough to make its effect on humidity significant on relevant margins. So, all of this discussion, and the number in your phone, is at atmospheric pressure. Below is a graph that illustrates the maximum amount of water that can be in the air at different temperatures (red line). So, at 30 degrees Celsius (86 degrees Fahrenheit), there can be as much as 27 grams (0.95 oz or ~2 tablespoons) of water in the air.