Humanity’s Last Exam in Nature

Last July I wrote here about “Humanity’s Last Exam”:

When every frontier AI model can pass your tests, how do you figure out which model is best? You write a harder test.

That was the idea behind Humanity’s Last Exam, an effort by Scale AI and the Center for AI Safety to develop a large database of PhD-level questions that the best AI models still get wrong.

The group initially released an arXiV working paper explaining how we created the dataset. I was surprised to see a version of that paper published in Nature this year, with the title changed to the more generic “A benchmark of expert-level academic questions to assess AI capabilities.”

One the one hand, it makes sense that the core author groups at the Center for AI Safety and Scale AI didn’t keep every coauthor in the loop, given that there were hundreds of us. On the other hand, I’m part of a different academic mega-project that currently is keeping hundreds of coauthors in the loop as it works its way through Nature. On the third, invisible hand, I’m never going to complain if any of my coauthors gets something of ours published in Nature when I’d assumed it would remain a permanent working paper.

AI is now getting close to passing the test:

What do we do when it can answer all the questions we already know the answer to? We start asking it questions we don’t know the answer to. How do you cure cancer? What is the answer to life, the universe, and everything? When will Jesus return, and how long until a million people are convinced he’s returned as an AI? Where is Ayatollah Khamenei right now?

It’s the Humidity

Recently, I learned what humidity is. That might sound stupid, so let me clarify. I knew that humidity is the water content of the air. I also knew that the higher the number, the more humid. Finally, I also knew that the dew point is the temperature at which the water falls out of the air. But, now I understand all of this in a way that I hadn’t previously.

First, what does it mean for there to be 70% humidity? As it turns out, it’s a moving target. There are two types of humidity: specific and relative. Specific humidity is the mass of water in, say, a kilogram of air. So, more humidity means more water. This is obvious. There’s a related concept called absolute humidity, which is more like mass of water per volume of air (sometimes used in place of specific humidity). Again, more humidity means more water. Neither of these is the way that humidity is reported on the weather channel.

Relative humidity is the number that you see in your weather app. What’s that? Relative to what? First, we need to know that warm air can hold more water than cool air. Pressure also matters, but atmospheric pressure doesn’t change enough to make its effect on humidity significant on relevant margins. So, all of this discussion, and the number in your phone, is at atmospheric pressure. Below is a graph that illustrates the maximum amount of water that can be in the air at different temperatures (red line). So, at 30 degrees Celsius (86 degrees Fahrenheit), there can be as much as 27 grams (0.95 oz or ~2 tablespoons) of water in the air.

More after the jump.

Continue reading