Data – Economist Writing Every Day

Consumer Debt Delinquency & Write-Offs

June 19, 2026June 18, 2026Zachary BartschLeave a comment

I wrote a post about debt delinquency way back in 2023. At the time, people were concerned about an impending recession. I argued that, if there were to be a recession, then debt defaults would not be the cause. The delinquency numbers were low and stable. Though delinquencies did rise some, no recession materialized. I’ll say a little more about how to interpret the numbers and give an update.

There exists a stock of loan balances. Most loans are in good standing with scheduled payments being made. This is good debt. Some debt is delinquent, meaning that payments are not being made. This is bad debt. What happens to bad debt? Sometimes those borrowers catch up on their payments and their loan balances switch to being good debt. Borrowers can also transform their bad debt into good debt by restructuring it with new terms. Temporary administrative adjustments can also change the classification from bad to good debt. At any moment, the total stock of debt is composed of good and delinquent debt. We can express these as proportions of all debt.

But the lenders also recognize that not all bad debt will be made good. For one reason or another, sometimes borrowers just don’t repay. It doesn’t make sense to list delinquent debt as a balance sheet asset if it will never be paid. Rather than accumulating more bad debt every year that will never be paid, banks ‘charge off’ some of that bad debt. Charging off bad debt lets banks realize losses and makes for a more realistic balance sheet. The flow of charge offs is deducted from the stock of delinquent debt.

If banks charge off some delinquent debt, then the proportion of delinquent debt should be lower in the next period, all else constant. But all else isn’t constant. Some good debt will become delinquent and some delinquent debt will become good. Though, after a charge off it’s true that delinquent debt is less than it would have been otherwise. Below, I denote the net flow of good & bad debt transitions as ‘r’ and solve for it.

The variable ‘r’ is the net transition to good or to bad debt after charge offs. If r>0, then net new delinquencies occurred faster than banks realized their losses with charge offs. Is that good or bad? A higher rate of net new delinquencies can be bad because it reflects that people aren’t paying their contractually obligated debts. But it can also be good if the new delinquencies are a result of experimental entrepreneurship and an innovative economy. The bad interpretation is probably relevant cyclically as a short or medium run variable. The innovation interpretation probably changes in the medium or long run as a structural variable.

Let’s look at the numbers. There are several categories of loans, but let’s start with just consumer loans.

The delinquency rate is higher than it was after the pandemic stimulus checks, but is still lower than historical rates. The charge off rate is also near the historical average. Below right graphs ‘r’ and it’s always greater than zero, meaning that there’s always more people transitioning from good debt to delinquency than the reverse. There was more debt becoming delinquent as post-pandemic interest rates rose, but net delinquency transitions have been falling since 2024q1 until 2026q1 when they mildly up-ticked. In other words, the aggregate consumer debt picture looks pretty average except for the secular decline in rates of delinquency. I don’t know why that is. Maybe banks have gotten better are identifying risk? Or maybe newer forbearance rules are friendlier to borrowers who need to pause payments?

Below are the same two graphs for single-family residential mortgages. These delinquencies are close to historical lows and charge offs are average. However, the ‘r’ graph below has been rising for a decade and is currently at a twelve-year high. Since the data only goes back so far, it’s hard to say whether the low numbers of the late twenty-teens were an aberration of the post GFC, low interest rate environment or whether we should be concerned. It is worth noting that the ‘r’ values are often below zero, which means that people do often come back from delinquency. We know it’s not simply charge offs doing the work there since the charge off rate has been steady and very low.

Continue reading →

Price Level: Noise vs Signal

April 24, 2026April 28, 2026Zachary Bartsch3 Comments

My university recently hosted a guest speaker. Among their content, they included some nominal macroeconomic values from pre-2020, back in the era when inflation was very low. That roughly includes the years 2012-2019. Truly, inflation stayed below 2% through February of 2021, but I think that we can all agree that the economy was different in a few ways beginning in 2020.

I asked the speaker why not express the nominal values in real terms. They were emphatic that the low rates of inflation at the time implied that the signal-to-noise ratio was too low. Therefore, the ‘real’ inflation adjusted values would not be more precise because excessive noise would be introduced into the series during a period when not much deflating was necessary in the first place.

My answer to this is a firm ‘maybe’. It makes sense and it’s plausible (Jeremy has written about error and revisions in the past). We can think about the noise in price indices in a few ways.

1) It may be information is incomplete and becomes more complete as time passes. This sort of noise only exists in the short-run and is resolved as more information becomes available later in time. Revisions tend to happen each month for prior months, as well as each year for prior years. There are also big revisions after methodological, consumption weight, and data source changes.

2) Another type of noise is due to incomplete information that is never resolved. After all, the government statisticians can’t see literally all of the transactions. Those unobserved transactions will never make it into the official inflation measures and we’ll never get a perfect picture.

3) Methodological artifacts may also include known biases. This type of noise doesn’t get corrected except after major changes to the series. If those changes never happen, then we just sort of live with imprecision. Luckily, so long as the bias is consistent, then percent change in the price indices will approximate the underlying true levels. However, if there are non-random biases in the percent change, then it can cause some trouble.

One way to get an idea for the amount of noise in the data is to observe the magnitude of revisions. Of course, this only helps us with the first type of noise above that eventually gets resolved with more information. It’s much harder to get a handle on the imprecision that is not identifiable. The Philadelphia Federal Reserve Bank provides an easy-to-use database that puts all of the archival and revised numbers for many macro series in a single place: the Real-Time Data Set (RTDS). It includes every historical PCE price index value for each publication month. Let’s limit our sample to the 21^st century.

Continue reading →

Messy Disability Records in the Historical Censuses

March 28, 2025March 29, 2025Zachary Bartsch1 Comment

The historical US Census roles of disability among free persons are a mess. Specifically for the 1850-1870 censuses, the census bureau was not professionalized and the pay was low (a permanent office wasn’t founded until 1902). So, the enumerators were temporary employees and weren’t experts of their art. To boot, their handwriting wasn’t always crystal clear. Second, training for disability enumeration was even less complete and enumerators did their best with whom they encountered and how they understood the instructions. Finally, the digitized data in IPUMS doesn’t perfectly match the census reports. What a mess.

Guilty by Association

Disabled people and their families often misreported their status out of embarrassment or shame. Given that enumerators had quotas to fill, they were generally not inclined to investigate claimed statuses strenuously. Furthermore, disabled people were humans and not angels. Sometimes they themselves didn’t want to be associated with other types of disabled people. In particular, the disability designation in question (13) on the 1850 census questionnaire asked “Whether deaf and dumb, blind, insane, idiotic, pauper or convict”. Saying “yes” may put you in company that you don’t prefer to keep.

Summer censuses also sometimes missed deaf students who were traveling to or from a residential school.

Enumerator Discretion

The enumerator’s job was to write the disability that applied. What counts as deaf and dumb? That’s largely at the enumerator’s discretion. Some enumerators wrote ‘deaf’ even though that wasn’t an option. Was that shorthand for ‘Deaf and Dumb’? Or were they specifying that the person was deaf only and not dumb? We don’t know. But we do know that they didn’t follow the instructions. What if a person was both insane and blind? Then what should be written? “Blind/Insane” or “Blind and Insane” or “In-B” and any number of combinations were written. Some of them are easier to read than others.

Data Reading Errors

IPUMS is the major resource for using census data. The historical data was entered by foreign data-entry workers who didn’t always speak English. So, the records aren’t perfect. Some of the records are corroborated with Optical Character Recognition (OCR), but the historical script is sometimes hard to read. Finally, the fine folks at familysearch.org and Brigham Young University have used Church of Latter Day Saints (LDS) volunteers to proof data entries. Regardless, we know that the IPUMS data isn’t perfect and that the disability data is far from perfect. Usually, reports don’t dwell on it. They simply say that the data is incomplete.

The disability data is incomplete for a lot of reasons related to the respondent, the enumerator, the instructions, and the digital data creation. What a mess.

Optimal Protein Consumption in the 21st Century: A Model

March 21, 2025April 21, 2026Zachary Bartsch1 Comment

I’ve discussed complete proteins before. I’ve talked about the ubiquity of protein, animal protein prices, vegetable protein prices, and a little but about protein hedonics. My coblogger Jeremy also recently posted about egg prices over the past century. Charting the cost of eggs is great for identifying egg affordability. But a major attraction of eggs is that they are a ‘complete protein’. So how much of that can we afford?

Here I’ll outline a model of the optimal protein consumption bundle. What does this mean? This means consuming the quantities of protein sources that satisfy the recommended daily intake (RDI) of the essential amino acids and doing so at the lowest possible expenditure. Clearly, this post includes a mix of both nutrition and economics. Since a comprehensive evaluation that includes all possible foods would be a heavy lift, here I’ll just outline the method with a small application.

Consider a list of prices for 100 grams of Beef, Eggs, and Pork.* We can also consider a list that identifies the quantity that we purchase in terms of hundreds of grams. Therefore, the product of the two yields the total that we spend on our proteins.

Of course, not all proteins are identical. We need some characteristics by which to compare beef, eggs, and pork. Here, I’ll use the grams of essential amino acids in 100 grams of each protein source. Because there are different RDIs for each amino acid, I express each amino acid content as a proportion of the RDI (represented by the standard molecular letter).

Then, we can describe how much of the RDI of each amino acid that a person consumes by multiplying the amino acid contents by the quantities of proteins consumed.

Our goal is to find the minimum expenditure, B, by varying the quantities consumed, Q, such that the minimum of C is equal to one. If the minimum element of C is greater than one, then a person could consume less and spend less while still satisfying their essential amino acid RDI. If the minimum element is less than one, then they aren’t getting the minimum RDI.

How do we find such a thing? Well, not algebraically, that’s for sure. I’ll use some linear programming (which is kind of like magic, there’s no process to show here).

The solution results in consuming only 116.28 grams of Pork and spending $1.093 per day. The optimal amino acid consumption is also below. Clearly, prices change. So, if eggs or beef became cheaper relative to pork, then we’d get different answers.

In fact, we have the price of these protein sources going back almost every month to 1998. While pork is exceptionally nutritious, it hasn’t always been most cost effective. Below are the prices for 1998-2025. See how the optimal consumption bundle has changed over time – after the jump.

Continue reading →

A Forgotten Data Goldmine: Foreign Commerce and Navigation Reports

March 14, 2025March 14, 2025Zachary BartschLeave a comment

Economists rely on trade data. The historical Foreign Commerce and Navigation of the United States reports detailed monthly figures on imports, exports, and re-exports. This dataset spans decades, providing a crucial resource for researchers studying price movements, consumption patterns, and the effects of war on global trade.

The U.S. Department of Commerce compiled these reports to track the nation’s commercial activity. The data cover a vast range of commodities, including coffee, sugar, wheat, cotton, wool, and petroleum. Officials recorded trade flows at a granular level, enabling economists to analyze seasonal fluctuations, wartime distortions, and postwar recoveries. Their inclusion of re-export figures allows for precise estimates of domestic consumption. Researchers who ignore re-exports risk overstating demand by treating imports as goods consumed rather than goods in transit.

Continue reading →

Excel’s Weird (In)Convenience: COUNTIF, AVERAGEIF, & STDEVIF

December 20, 2024December 23, 2024Zachary Bartsch1 Comment

Excel is an attractive tool for those who consider themselves ‘not a math person’. In particular, it visually organizes information and has many built-in functions that can make your life easier. You can use math if you want, but there are functions that can help even the non-math folks

If you are a moderate Excel user, then you likely already know about the AVERAGE and COUNT functions. If you’re a little but statistically inclined, then you might also know about the STDEV.S function (STDEV is deprecated). All of these functions are super easy and only have one argument. You just enter the cells (array) that you want to describe, and you’re done. Below is an example with the ‘code’ for convenience.

=COUNT(A2:A21)
=AVERAGE(A2:A21)
=STDEV.S(A2:A21)

If you do some slightly more sophisticated data analysis, then you may know about the “IF” function. It’s relatively simple; if a proposition is true (such as a cell value condition), then it returns a value. If the proposition is false, then it returns another value. You can even create nested “IF”s in which a condition being satisfied results in another tested proposition. Back when excel had more limited functions, we had to think creatively because there was a limit to the number of nested “IF” functions that were permitted in a single cell. Prior to 2007, a maximum of seven “IF” functions were permitted. Now the maximum is 64 nested “IF”s. If you’re using that many “IF”s, then you might have bigger problems than the “IF” limitations.

Another improvement that Excel introduced in 2019 was easier array arguments. In prior versions of Excel, there was some mild complication in how array functions must be entered (curly brackets: {}). But now, Excel is usually smart enough to handle the arrays without special instructions. Subsequently, Excel has introduced functions that combine the array features with the “IF” functions to save people keystrokes and brainpower.

Looking at the example data we see that there is an identifier that marks the values as “A” or “B”. Say that you want to describe these subgroups. Historically, if you weren’t already a sophisticated user, then you’d need to sort the data and then calculate the functions for each subgroup’s array. That’s no big deal for small sets of data and two possible ID values, but it’s a more time-consuming task for many possible ID values and multiple ID categories.

The early “IF” statements allowed users to analyze certain values of the data, such as those that were greater than, less than, or equal to a particular value. But, what if you want to describe the data according to criteria in another column (such as ID)? That’s where Excel has some more sophisticated functions for convenience. However, as a general matter of user interface, it will be clear why these are somewhat… awkward.

Continue reading →

Services, and Goods, and Software (Oh My!)

August 16, 2024Zachary BartschLeave a comment

When I was in high school I remember talking about video game consumption. Yes, an Xbox was more than two hundred dollars, but one could enjoy the next hour of that video game play at a cost of almost zero. Video games lowered the marginal cost and increased the marginal utility of what is measured as leisure. Similarly, the 20th century was the time of mass production. Labor-saving devices and a deluge of goods pervaded. Remember servants? That’s a pre-20th century technology. Domestic work in another person’s house was very popular in the 1800s. Less so as the 20th century progressed. Now we devices that save on both labor and physical resources. Software helps us surpass the historical limits of moving physical objects in the real world.

There’s something that I think about a lot and I’ve been thinking about it for 20 years. It’s simple and not comprehensive, but I still think that it makes sense.

Labor is highly regulated and costly.
Physical capital is less regulated than labor.
Software and writing more generally is less regulated than physical capital.

I think that just about anyone would agree with the above. Labor is regulated by health and safety standards, “human resource” concerns, legal compliance and preemption, environmental impact, and transportation infrastructure, etc. It’s expensive to employ someone, and it’s especially expensive to have them employ their physical labor.

Continue reading →

IPUMS Data Intensive Workshop & Conference

August 2, 2024August 2, 2024Zachary BartschLeave a comment

I just returned from the Full Count IPUMS data workshop at the Data-Intensive Research Conference that was hosted by the Network on Data Intensive Research on Aging and IPUMS. The theme of this conference was “Linking Records”.

It was the best workshop and conference that I’ve ever attended. I’d attended the conference remotely in the past. But attending the workshop was exceptional. Myself and about 20 other people were flown to the Minneapolis Population Center and put up in a hotel during our stay (that made the conference a low-stress affair). The whole workshop was well organized, the speakers built on one another’s content, and there was a hands-on lab for us to complete. I felt my human capital growing by the hour.

Continue reading →

Fossil Fuel Frenzy: The Driving Force Behind US Extractive Growth

April 26, 2024April 26, 2024Zachary BartschLeave a comment

What with all the talk about semi-conductor production and rare-earth mineral extraction, I think that it’s worth examining what the USA produces in terms of what we get out of the ground. This includes mining, quarrying, oil and natural gas extraction, and some support activities (I’ll jump more into the weeds in the future). I’ll broadly call them the ‘extractive’ sectors. How important are these industries? In 2021 extractive production was worth $520 billion. That was roughly 2% of all GDP. Below is the break down by type of extraction.

Examining the graph of total extraction output below tells a story. The US increased production of extracted material substantially between the Great Depression and 1970. That’s near the time that the clean water and clean air acts were passed. But the change in the output growth rate is so stark, that I suspect that those were not the only causes of change (reasonable people can differ). For the next 40 years, there was a malaise in output. This was the period during which it was popular to talk about our natural resource insecurity. As in, if we were to be engaged in a large war, then would we be able to access the necessary materials for wartime production?

https://fred.stlouisfed.org/graph/?g=1kWNU

But for the past 15 years we’ve experienced a boom with extracted output rising by 50%, an average growth rate of 2.7% per year. That’s practically break-neck speeds for an old industry at a time when the phrase ‘great stagnation’ was being thrown about more generally. By 2023, we were near all-time-high output levels (pre-pandemic was higher by a smidge).

For people concerned about resource security, the recent boom is good news. For people who associate digging with environmental degradation, greater extraction is viewed with less enthusiasm. Those emotions are especially high when it comes to fossil fuel production. Below is a graph that identifies the three major components of extraction indexed to the 2021 constant prices. By indexing to the relative outputs of a particular year, the below graph is a close-ish proxy to real output that is comparable in levels.

Continue reading →

Counting Jobs (Revisited)

April 24, 2024Jeremy Horpedahl3 Comments

In January 2023 I had a post looking at the different ways that the Bureau of Labor Statistics measures employment. Those who follow the data closely probably know about the difference between the household and establishment surveys, which the monthly jobs report data is based on. But these are just surveys.

The more comprehensive data (close to the universe of workers, roughly 95%) is the Quarterly Census of Employment and Wages. While more comprehensive, this data comes out with a much longer lag, and is only released once per quarter. The QCEW is just the raw count of workers, which is useful in some ways, but we also know that there are normal seasonal fluctuations, which the QCEW doesn’t adjust for. Therefore, year-over-year changes in jobs are the best way to look at trends in this data. In September 2023 (latest month available), the US had 2.25 million more workers than in the previous September. For comparison, the establishment survey showed an increase of 3.13 million jobs that month, and the household survey showed a change of 2.66 million — suggesting they both might be overstating job growth.

Still with me? Here’s one more set of jobs data: the Business Employment Dynamics data. This dataset is built on the QCEW data, but allows more fine detailed insights into what types and sizes of firms are gaining or losing jobs. Like the QCEW, the most recent data is for the 3rd quarter of 2023 (just released today), but when looking at the aggregate data, it has one advantage over the QCEW: it is seasonally adjusted, so we can look at the most recent quarterly change (not really useful for not-seasonally-adjusted data). The BED data also looks only at private sector jobs, so it is looking at the health of the private labor market (and ignoring changes in government employment).

The latest BED data do show a possibly worrying trend: the 3rd quarter of 2023 showed a net loss of 192,000 private-sector jobs. That’s the first loss since the height of the pandemic, and ignoring the first half of 2020, the only quarterly decline since 2017. Here’s the chart (note: y-axis is truncated because the 2020q2 job loss is so large it makes the chart unreadable):

I should note that this data is subject to revisions, even though the QCEW is mostly complete. The second quarter of 2022 originally showed a decline, but that was later revised upwards as QCEW is updated and seasonal adjustment factors are updated. Still as, this data stands, it is a worrying jobs number that differs from the monthly surveys. For the change from 2023q2 to 2023q3, the establishment survey shows a gain of 640,000 jobs and the household survey also shows a gain of 546,000. Like the QCEW raw data, the BED seasonally adjusted data suggests that the monthly surveys may be overstating job growth.