Quasi-Relative Measures of Portfolio Performance

Last week I discussed absolute measures of portfolio performance and management, specifically between two portfolios that are composed of different assets (utilities and tech). I began with comparing the basics of return, standard deviation, and Sharpe ratio to some other possible portfolio in the Markowitz cloud. But, simply comparing the difference between these possible portfolios can be sensitive to the spread of stats within a specific Markowitz cloud. In other words, it’s not scale independent. A larger spread of possible stats can make a portfolio look bad due to the spread return/standard deviation/Sharpe ratio alone.

In this post I introduce quasi-relative measures. Again, I lean on the Markowitz cloud. They’re pasted below (Utilities on the left, tech on the right).

If we can somehow express the returns, volatilities, and Sharpe ratios on a common scale that is independent of the level values, then we can make the realized portfolios more comparable. One thing that we can do is to express a stat as a weighted linear average between the maximum and minimum possible values. Conditional on the realized standard deviation, there exists a maximum and minimum of possible return. Something like the below. Rho is the weight on the maximum return. It’s also the proportion of possible conditional returns that are lower than the realized return.

The unconditional version is the same, but would be relative to the global maximum and minimum stats. We can represent the weigh on the maximum return and the percentile among possible returns as gamma.

A final quasi-relative measure of performance is the dissimilarity index between the realized portfolio weights and some reference portfolio weights. This provides a measure of how much the asset weights would need to change in order to adjust the portfolio.  If changing portfolio weights is costly, then it’s also a measure of the transaction cost of reallocation. It’s quasi-relative because it is independent of the spread of possible performance stats.

Below are the quasi-relative measures for each the utility and tech company portfolios.

Continue reading

The “Reality Index” of Price Inflation Isn’t Grounded in Reality

Over the years, many people have tried to create alternatives to the CPI for measuring inflation. Probably the most famous is “Shadow Stats,” which Tim Lee has convincingly shown isn’t actually measuring price inflation (it’s just adding a fixed factor to the CPI).

But the CPI critics keep coming. One that was recently released is called the “Reality Index.” This index tries to improve on the CPI-U in two ways. First, it uses fixed weights for the items in the basket, and importantly it uses the 2024 weights and applies them to past years (this is called a Paasche index). Second, it takes out some BLS prices to avoid using hedonically adjusted prices, and other price calculations that the Reality Index author thinks are weird.

Both of these changes are problematic. I will explain why.

1. Fixed Basket of Goods/Services Doesn’t Make Sense

Many critics of the CPI complain about the shifting weights in the CPI. “We just want to measure the cost of a fixed basket over time.” But measuring a fixed basket over time isn’t actually that useful. I will explain why in a moment. But that’s not even what the Reality Index does! Instead, it takes the 2024 CPI weights (which come from the Consumer Expenditure Survey), and then consistently applies those weights to past years. The Index isn’t measuring the cost of a fixed basket of goods from some past year — it is using the 2024 basket, and assuming that’s what people consumed in the past.

The author of the Reality Index, Tom Elliott, is either confused about this or is being deliberately misleading, for example in a recent WSJ essay promoting the Index, he says “That same basket, the one the government says rose 1.87 times since 2000, has actually risen about 2.4 times.” But that’s false. To do that calculation, you would need to use the 2000 CPI weights and follow them forward to 2024 (this is called a Laspeyres index). Instead, he uses the 2024 weights and follows them backwards. He could do the calculation that he references in the WSJ essay, but he does not.

To see why this is a bad approach, let’s compare the weights in the Reality Index with a few past years. I have done my best to translate the weights for the 10 categories listed on this page to actual BLS categories, though I will admit that none of their category weights matched exactly to what I found at BLS. But I’m pretty confident it is correct.

I am also pretty confident that the “discretionary” category is just a residual for everything that wasn’t in the other 9 categories, though I can’t find them explicitly saying this. Yellow highlighting indicates the category in past years was smaller than the 2024 weights. Green highlighting indicates past years were larger weights.

The first thing you might notice is that the CPI weights have changed significantly over time. Relative to 1970, housing/shelter gets almost twice as much weight today. Conversely, groceries/food at home gets about half the weight today as it had in 1970. The “discretionary” category (the residual to make it add to 100%) used to be 30 percent of a household budget, using this approach! That should really give you pause: do we really think a typical household in 1970 considered 30% of their budget to be “discretionary”? I highly doubt it. That discretionary category includes clothing, which was over 10% of household spending in 1970 (it’s around 2% today).

Related to that, you may also notice that categories which have had above average inflation over this time frame — such as housing, healthcare, and education — all have bigger weights today than in the past. Meanwhile, food and clothing have seen less price inflation, but they are weighted much less. This process will tend to overstate inflation of the past, as the CPI in 1970 placed less weight on, say, housing, so when you put more weight on it, of course the inflation rate will go up. And indeed, as the Reality Index’s historical analysis shows, the biggest gaps in inflation between the RI and CPI were in the 1970s (4.9% gap in 1979 and 4.7% gap in 1978). But this is ahistorical: people were not spending 37% of their budget on shelter in the 1970s! In fact, they were spending almost as much on groceries in 1970 as they did on shelter.

The Reality Index is essentially projecting backwards to a fake reality of the past, because it uses the 2024 weights in all past years. But this isn’t capturing anything real about the world, and it is at best an interesting thought experiment. Of course, part of the reason people now spend more of their budget on housing and healthcare is because they have gotten more expensive and to some extent crowded out other spending. But they are also categories we might expect demand to increase as incomes increase (normal goods). And notice this is the opposite of the standard critique of the CPI: as things get more expensive, critics claim the CPI assumes people spend less on those items. Instead, the CPI-U weights are updated each year based on the latest Consumer Expenditure Survey data, and goods/services with higher rates of inflation now consumer more of the weight of the CPI than in the past.

(*Note: the “pet” category is listed as 0% in 1970 because BLS didn’t itemize it separately due to it being so small. That’s of little consequence, since it is such a small share in every year — I’m surprised they didn’t just stuff pets in the discretionary category.)

2. Swapping Quality-Adjusted Measures for Nominal Prices is Often a Bad Idea

Using the 2024 weights for past years is reason enough to not find the Reality Index useful. But let me just say a few words about the substitute prices that the Reality Index uses. The changes are either trying to use something that isn’t hedonically adjusted for quality, or to overcome some of the strange calculations, especially for housing and health care.

Continue reading

Absolute Measures of Portfolio Performance

The basic idea is that we want to compare the performance of different portfolios or their managers. This is relatively easy as long as the portfolios contain the same assets. Then, the portfolios are simply characterized by the different weights among the different assets. But how do we compare the performance of portfolios whose assets are different? In finance, we usually assume that everyone can invest in everything. But there are plenty of cases in which that’s a bad assumption: when clients want exposure to particular industries, when there are statutory limitations on holding certain assets, or when an individual company is considering specific projects within the same company under conditions of scarce financing.

The most primitive step is to compare the return and standard deviation of two different portfolios. However, higher risk investments tend to have higher returns in dynamic equilibrium. So, if we were to compare the returns of a tech company to a utility company, then we’d often see the tech companies performing better. But, if we compare the volatilities, then the utility companies would tend to perform better. Sharpe stepped in with a ratio to express the excess return (benefit) per standard deviation (the cost). This way, we can compare the price of volatilities between two portfolios. We’ll stick with just these basic 3 measures: return, standard deviation, and Sharpe ratio. (Others do exist)

Let’s put some meat on this with an example. Say that we have two portfolios, each composed of different assets. There’s a utility portfolio that’s composed of NEE, DUK, and SO. There’s also a tech portfolio that’s composed of AMD, MSFT, and NVDA. Both portfolios have weights of (0.33, 0.33, 0.34).  The results of the utility versus the tech portfolio are:

  • Returns: 14.2% vs 136.3%
  • Standard Deviation: 14.9% vs 32%
  • Sharpe: 0.684 vs 4.134

Goodness me! The tech portfolio returns much more in absolute terms and much more per unit of risk. It’s twice as volatile as the utility portfolio, but the returns are almost ten times as high. If you could, then many of us would choose the tech portfolio over the utility portfolio. But, what if, for one reason or another, you can only invest in one of the two industries? Or, what if you want to invest your money with a skilled manager, rather than a risky one?

One way to tackle this problem is to introduce the Markowitz cloud. Specifically, we can essentially list out all of the possible portfolios along with their return and standard deviations. Then, we can compare the actual performance to the entire menu of possible performances within each set of assets. Below are the possible performances for the utility (left) versus the tech (right) portfolio. The actual portfolios are marked with an X.

One way to evaluate the two portfolios is to compare their return, standard deviation, and Sharpe ratio to the other candidates that were achievable with the same assets. As we can see, conditional on the assets, neither portfolio minimized the volatility, maximized return, nor maximized the Sharpe ratio. Furthermore, assuming that the realized rate of return was the goal, neither portfolio minimized the conditional volatility. Assuming that the realized volatility was the goal, neither portfolio maximized the conditional return. Below are two tables that describe some candidate alternatives and how they differ from the realized portfolio.

Continue reading

The US is Building a Lot More Data Centers Than Five Years Ago, But We Are Still Building More Warehouses

Data centers seem to be popping up everywhere. And based on the value of current construction, the US is indeed building a lot more data centers than we were in 2020 or 2021, about four times as much data center construction (inflation adjusted).

But… did you know that we build a lot more good-old manufacturing than data centers? Almost four times as much in recent months. And that’s even after a decline in manufacturing construction over the past year and a half.

The US also builds about the same amount of warehouses and chemical plants as we do data centers. Data centers may exceed those two categories in a few years, but for now they are pretty similar.

Keep in mind that manufacturing and chemical facilities also use a lot of electricity and water, and have plenty of local negative externalities! Warehouses probably have a lot less resource consumption and external effects, but it’s not zero either.

Are data centers popping up everywhere? Well, people are certainly noticing them. But so are lots of other types of buildings, which rarely register more than a peep from concerned citizens and local media, unless there is some clear and obvious external effect.

Fuel Costs Are Way Up, But It’s Still Pretty Affordable to Fill Up Your Tank (relative to wages)

Two months ago I wrote about gasoline prices and tried to give the current prices some historical context. Gas prices have, of course, only continued to increase since then. Here’s a chart I created to give a bit more context, using an idea from Ryan Radia: how much does it cost to drive a car 250 miles? Since fuel efficiency has increased over time, we might be understating how much it costs to drive today relative to the past. And of course, to give the “cost” proper context I have stated in terms of hours worked at the average wage (note: the final data point is from April 2026, as we don’t have wage data for May yet):

In April 2026 it took about 1.4 hours of work at the average wage ($32.23) to purchase enough gasoline to drive 250 miles (10.7 gallons) at the average fuel efficiency (23.4 miles per gallon). That average fuel efficiency figure is from 2024, the latest available, so it could be a bit higher today. Maybe it’s a little easier than 1.4 hours of work to buy it, but even if fuel efficiency had crept up to 25 mpg (that would be a big increase in 2 years, historically speaking), it would still be 1.3 hours of work.

1.4 hours of work is certainly a big jump from earlier in 2026, but you’ll notice it is still on the low end in this chart, and well below the peak we saw in June 2022 of just over 2 hours of work to buy 250 miles worth of gasoline.

But 23.4 miles per gallon is pretty low, as this is includes lots of trucks and SUVs with pretty bad fuel efficiency. What if we looked at some more fuel efficient vehicles?

Here’s a few I checked on (all for 2026 models, with gas and electricity at current national averages):

  • Toyota Camry: 0.71 hours of work
  • Chrysler Pacifica Hybrid: 0.61 hours on electric, 1.18 hours on gasoline
  • Tesla Model Y: 0.37 hours of work

It will probably not surprise you that the all-electric Tesla Model Y is cheaper than the average car to operate at current prices, but you may not have realized that it is almost four times cheaper. But the Toyota Camry, with all models operating as hybrids now, also comes in pretty good at about half the cost of the average vehicle to operate (and the Camry is a very affordable car to purchase). The Chrysler Pacifica hybrid minivan does pretty well too, though even operating only on electricity (30 miles at a time), it’s only slightly more fuel efficient than the Camry.

arXiv will ban authors who submit papers with LLM mistakes

In the world of academic preprints, arXiv has long been the go-to platform for researchers to share work quickly. But with the explosion of generative AI tools, the repository is drawing a line in the sand.

On May 14, 2026, arXiv moderator Thomas Dietterich announced a clarified enforcement policy. If a submission contains incontrovertible evidence that authors didn’t properly check LLM-generated content, all listed authors face serious consequences.

What counts as “Incontrovertible Evidence”? The policy targets clear signs of unchecked AI output, including:

  • Hallucinated or fake references
  • Meta-comments left by the model (e.g., “Here is a 200-word summary; would you like me to make any changes?” or placeholder instructions like “fill in the real numbers from your experiments”)
  • Other obvious errors, plagiarized text, biased content, or misleading claims generated by AI

arXiv’s Code of Conduct already holds every author fully responsible for the entire paper’s contents.

The Penalty

  • One-year ban from submitting new papers to arXiv.
  • After the ban, future submissions must first be accepted at a reputable peer-reviewed venue before arXiv will host them.

At first researchers discussing the policy online seemed happy about the one-year ban, but when I pointed out that it is essentially a ban for life to use it at a pre-print venue, some people became nervous.

Why now? arXiv has been overwhelmed by low-effort “AI slop.” These papers are marked by fabricated citations and shallow summaries. This erodes trust in the entire preprint ecosystem.

In response to the complaints (someone like me would be worried that I’ll somehow let an error slip through and then be banned for life from posting working papers), Scientific Director Steinn Sigurðsson shared:

on the whole @arxiv flap about hallucinated references etc

you don’t see the stuff we reject… some of it is really really egregious

the decision to impose additional consequences is largely to throttle that stuff so n00bs and bad actors don’t trash us trying repeatedly

This is the problem that we face with every internet forum. A few bad actors ruin it for good people.

In 2022 I wrote Content moderation strategy

Elon Musk buying Twitter is the big news this week. He wants to enhance free speech on the site and, according to him, make it more open and fun. Some fans are hoping that he will make the content moderation and ban policy more transparent. Maybe that’s possible. 

If no one can be banned, then bad actors will bring the whole platform down. Inevitably, good people get caught in the net, and it’s devastating to be locked out of a platform where your peers are sharing.

However, if you want to be taken seriously by tech folk then ask for a system that is possible. A substantially better experience might be incompatible with the site being free to users.

Part of the problem that I don’t hear people talking about is that a free platform is not easily compatible with good customer service.

For some not-fake work and citations: Buchanan et al. (2024) provided early clear evidence that a mark of LLM-written work is fake citations. And, Buchanan and Hickman (2024) show that certain framings can prompt people to be more suspicious of AI-generated writing, such that they are pushed toward doing a fact-check before believing all claims.

Buchanan, Joy, and William Hickman. “Do people trust humans more than ChatGPT?.” Journal of Behavioral and Experimental Economics 112 (2024): 102239.

Buchanan, Joy, Stephen Hill, and Olga Shapoval. “ChatGPT hallucinates non-existent citations: Evidence from economics.” The American Economist 69.1 (2024): 80-87.

How do Income Tax Brackets Work?

I was listening to an episode of The Deduction, a podcast by the Tax Foundation. As if that first sentence isn’t evident enough, I was reminded of how confusing taxes are – period. Even experts disagree and see grey areas. As I was listening, I thought “man, they need a graph”. So, here we are.

Income Tax Vocabulary

The money that you are paid by your employer is your gross income. Not all of it is taxable. You can deduct money from your gross income to get your taxable income. Most people subtract the ‘standard deduction’ from their gross income, which is how I’ll proceed in this post. Since the standard deduction for 2026 is $16,100 for a single earner, that means that your taxable income is $16,100 less than your gross income. By following a formula, one can calculate the amount of money that they must pay the government. These payments can be all at once, throughout the year, or even directly from your paycheck. The total that’s due to the government by April 15 is called the total tax liability. Finally, the money that the government doesn’t take, and that you get to keep, is called your net income. It’s your income net of taxes.

If you’ve had a job, then you are probably most familiar with your gross income, what your employer pays you, and your net income, what you get to take home. The steps in between might include some hand-waving.

Marginal Tax Rates

One of the most confusing pieces of the income tax code is marginal income taxes. Below are the brackets for 2026.

Marginal Tax rates work like this: Every dollar that you earn faces a tax rate. If your taxable income would be below zero, then you pay zero in taxes. But if your taxable income is $5k, then it gets taxed at a rate of 10%. That part should be pretty straightforward. But what if your taxable income is $15k? According to the table, you face a tax rate of 10% for dollars earned up to $12,400. That would be a tax liability of $1,240. But the remainder of your $15k in taxable income exists in the next tax bracket. That portion of your taxable income faces a tax rate of 12%. Sticking with the example, $2,600 is in the 12% tax bracket, so the tax liability for that portion of your taxable income is $312 (=$2.6k*0.12). Therefore, your total tax liability would be the sum of your tax liabilities across all applicable tax brackets: $1,552 (=$1,240+$312).

There are some features of marginal tax rates that are worth mentioning. Since the tax rates on the lower taxable income brackets don’t change, earning more gross income never reduces your net income unless the tax rate exceeds 100% (which it doesn’t here). So, when someone says that their taxable income is in the 35% tax rate bracket, they probably just mean that their last dollar earned is there. They’re only paying 35% on the taxable income that’s above $256,225. They’re not paying 35% of all earned dollars to the Internal Revenue Service (IRS).

Below is a graph that details the different marginal tax rates with shaded areas. The blue line is the average tax rate. It’s calculated by dividing the tax liability by the gross income. Even though one might earn an income that’s greater than $257k where the marginal tax rate is 35% or greater, the average tax rate remains lower, topping out at about 30% in this figure. The average tax rate is lower than an earner’s top marginal tax rate because the income in those lower brackets never disappears or get taxed at a higher rate.

Continue reading

Most Published Research Findings Are Directionally Correct

As a new quick rule of thumb inspired by the Nature papers, you could do worse than “cut estimated effect sizes in half”. If a published paper says that a college degree raises wages 100%, then chances are the degree really does raise wages, but more like 40–50%. In 2005, John Ioannidis said that “most published research findings are false”. By 2026, we seem to have improved to “most published research findings are exaggerated.”

That’s the conclusion of my piece out today at Econlog: “Is Economics Finally Becoming Trustworthy?

There’s plenty of both good and bad news for economics and the social sciences in both my piece and the Nature special issue it describes. It’s kind of like the Our World in Data motto:

In short, our attempt to replicate hundreds of papers showed that published social science results shouldn’t be trusted precisely today, but they seem to be getting more reliable over time, and they are much more reliable than chance. Economics and political science look the best, though we are still very far from perfect:

You can read the full piece here.

Gerrymandering Doesn’t Give an Obvious Edge to Either Party in the US House

Congressional districts must be redrawn after each US Census. In fact, that is one of the main functions the Census: to determine how many seats of the US House of Representatives that each state is allotted. A related function is to give states information about the distribution of the population in their state. Even if a state doesn’t gain or lose seats after a Census, the population in their state may have grown, shrank, or simply moved around within the state. If each Congressional district is to represent roughly the same number of people, district boundaries will still need to be redrawn even absent a change in the state’s total share of the US House seats.

That much is clear. However, given that historically and still largely today Congressional districts are drawn by state legislatures, there is a temptation and a real possibility that the party in power of a state legislature will draw boundaries in a way that benefits that party. There is nothing illegal about doing this as far as the federal Constitution is concerned (that I am aware of), but it does seem a bit unsporting. But I guess much of politics might be deemed “unsporting.”

Nonetheless, sometimes the shape of districts is so obviously weird and not representing an cohesive group of citizens or communities that it gets the derisive term “Gerrymander,” which derives from a historical example of a very odd looking district. But even if a district doesn’t look weird, it may still give one party an advantage that some deem unfair, such as by diluting one party’s supporters into multiple districts so they get no seats, or alternatively cramming all the supporters into one district so they have a very lopsided victory in just one district, rather than controlling multiple districts. This practice is known as “partisan Gerrymandering,” and it will be my focus in this post today (there are other forms, such as racial Gerrymandering, which are also important but are beyond the scope of this post).

Surely this practice occurs. Some states have tried to avoid it the problem of Gerrymandering by using non-partisan commissions, though this is a minority of states (less than a dozen), and when push-comes-to-shove they don’t actually seem that committed to the idea (both California and Virginia have essentially abandoned these commissions in 2025-26 to attempt to, once again, gain a partisan advantage). But lately a particular question has come up: does partisan Gerrymandering benefit one major party more?

In total for the US House, whatever Gerrymandering at the state level that is happening seems to roughly wash out in national representation: in the 2024 election, Republicans received about 51.7% of the two-party share of votes totaled over all House elections, and Republicans have about 50.6% of the seats in the House. Perhaps you could say that the GOP effectively loses 5 seats to what they “should” have in a truly proportional sense, but this ignores many factors, some of which I will discuss below. But even so, the GOP has a slim majority in the House and they won a slim total of national House votes. It’s about right.

But that “washing out” at the national level ignores some very large disparities at the state level. In some states, one party has all the House seats, even though they got nowhere near 100% of the House vote. Many of these are states with 1 or 2 House seats, which are less interesting because either there is no possibility of Gerrymandering (1 seat) or there is no obviously “fair” division, but it is not only those small states. For examples, Massachusetts gives all 9 seats to the Democrats, even though Republicans received 31.5% of the two-party vote share. Do Republicans deserve 3 of the seats? Is the fact that they don’t have 1/3 of the seats evidence of Gerrymandering? Conversely, in Oklahoma Republicans hold all 5 seats, even though Democrats got 30% of the vote. Should Democrats get a seat or two in Oklahoma?

(Note: for all vote data, I have queried Google Gemini Pro. I found multiple errors along the way, but I am fairly confident the numbers are all correct now. Please let me know if you spot any errors).

Neither Massachusetts nor Oklahoma’s Congressional representation is an obvious case of Gerrymandering on its face. It’s possible that 1/3 opposition party support in both states is perfectly even distributed across the state, such that it would not be possible to draw any “fair” districts that give the opposition roughly 1/3 of the seats. But it could be the result of Gerrymandering, or at least an indication we should look deeper. We can tally up all of the differences across states in the following chart:

Chart 1

Continue reading

GDP Forecasts for the First Quarter of 2026

Forecast models, betting markets, and surveys of experts all drastically overstated the actual growth of GDP in the last quarter of 2025. They were off in the initial release, which was just 1.4 percent, but this was even further revised down to 0.5 percent. All four of the sources I track were forecasting over well over 2 percent, with some over 3 percent.

Does that mean we shouldn’t trust the forecasts? Perhaps, but last quarter was largely pulled down by government spending cuts, which the models completed missed. You can see this very clearly in the Atlanta Fed GDPNow model. Perhaps they shouldn’t have been surprised by this drop in government spending, but that is where the major error was.

So what do these forecasts think about the first quarter data for 2026, which comes out tomorrow? The two best predictors historically, GDPNow (Atlanta Fed) and Kalshi, are pretty far apart on this one, over a percentage point difference, with GDPNow being the only forecast under 2 percent: