Get your HHS Data Ahead of Cuts

The US Department of Health and Human Services has announced it is cutting 10,000 of its 82,000 jobs and restructuring:

As part of the restructuring, the department’s 10 regional offices will be cut to five and its 28 divisions consolidated into 15, including a new Administration for a Healthy America, or AHA, which will combine offices that address addiction, toxic substances and occupational safety into one central office.

AHA will include the Office of the Assistant Secretary for Health, the Health Resources and Services Administration, the Substance Abuse and Mental Health Services Administration, Agency for Toxic Substances and Disease Registry, and the National Institute for Occupational Safety and Health.

These divisions do many different jobs, but as usual what stands out to me is their data- both because it is what I have found directly useful in the past, and because it is what I still have some control over now. Writing your Representatives or writing an op-ed has a minuscule chance of changing Federal policy, but if you download data, you definitely have that data.

What worries me here is that some of the agencies being consolidated might discontinue some of their data products going forward, or even pull some of what they have already created offline. I don’t think this is farfetched given what has happened so far, and given that even in good times these agencies pull down data they painstakingly prepared. For instance, HRSA only publicly posts the State- and County-level Area Health Resources File back to 2019, even though they have annual data going back to 2001.

Probably all 13 of the reorganizing divisions have data worth looking into, and given the staff cuts, even data products in the other divisions could be at risk. But my plan is to focus on the two reorganizing divisions whose data I have previously found useful- HRSA and SAMHSA. HRSA has a nice data download page with 16 different datasets, including the Area Health Resources File, which offers detailed information on the health care workers and facilities in each US county. SAMHSA offers the National Substance Use and Mental Health Services Survey, the Treatment Episode Data Set, and the National Survey of Drug Use and Health. I have previously cleaned and archived the state-level version of the NSDUH, but not the individual-level version that is for now still available from SAMHSA.

All of these datasets are easy to download now, and some will probably become very hard to access later, so now is a good time to take a few minutes and save whatever you think you might need.

Messy Disability Records in the Historical Censuses

The historical US Census roles of disability among free persons are a mess. Specifically for the 1850-1870 censuses, the census bureau was not professionalized and the pay was low (a permanent office wasn’t founded until 1902). So, the enumerators were temporary employees and weren’t experts of their art. To boot, their handwriting wasn’t always crystal clear. Second, training for disability enumeration was even less complete and enumerators did their best with whom they encountered and how they understood the instructions. Finally, the digitized data in IPUMS doesn’t perfectly match the census reports. What a mess.

Guilty by Association

Disabled people and their families often misreported their status out of embarrassment or shame. Given that enumerators had quotas to fill, they were generally not inclined to investigate claimed statuses strenuously. Furthermore, disabled people were humans and not angels. Sometimes they themselves didn’t want to be associated with other types of disabled people. In particular, the disability designation in question (13) on the 1850 census questionnaire asked  “Whether deaf and dumb, blind, insane, idiotic, pauper or convict”. Saying “yes” may put you in company that you don’t prefer to keep.

Summer censuses also sometimes missed deaf students who were traveling to or from a residential school.

Enumerator Discretion

The enumerator’s job was to write the disability that applied. What counts as deaf and dumb? That’s largely at the enumerator’s discretion. Some enumerators wrote ‘deaf’ even though that wasn’t an option. Was that shorthand for ‘Deaf and Dumb’? Or were they specifying that the person was deaf only and not dumb? We don’t know. But we do know that they didn’t follow the instructions. What if a person was both insane and blind? Then what should be written? “Blind/Insane” or “Blind and Insane” or “In-B” and any number of combinations were written. Some of them are easier to read than others.

Data Reading Errors

IPUMS is the major resource for using census data. The historical data was entered by foreign data-entry workers who didn’t always speak English. So, the records aren’t perfect. Some of the records are corroborated with Optical Character Recognition (OCR), but the historical script is sometimes hard to read. Finally, the fine folks at familysearch.org and Brigham Young University have used Church of Latter Day Saints (LDS) volunteers to proof data entries. Regardless, we know that the IPUMS data isn’t perfect and that the disability data is far from perfect. Usually, reports don’t dwell on it. They simply say that the data is incomplete.

The disability data is incomplete for a lot of reasons related to the respondent, the enumerator, the instructions, and the digital data creation. What a mess.

Home Health Certificate of Need

Certificate of Need laws require many types of health care providers to obtain the permission of a state board before they are allowed to open or expand in many US states. But there is a lot of variation from state to state in which types of providers are covered by these laws. I put together this map to show the 15 states that require new home health care agencies to obtain a Certificate of Need:

Source: My map based on data from National Conference of State Legislatures

CON states see reduced competition, which tends to be bad news for patients and new entrants, but good for existing providers and the private equity firms considering buying them.

But some CON states like Rhode Island have proposed reforms that would exempt home health agencies from the CON process, putting them in line with the majority of states that put new entrants on an even footing with incumbent providers.

Optimal Protein Consumption in the 21st Century: A Model

I’ve discussed complete proteins before. I’ve talked about the ubiquity of protein, animal protein prices, vegetable protein prices, and a little but about protein hedonics. My coblogger Jeremy also recently posted about egg prices over the past century. Charting the cost of eggs is great for identifying egg affordability. But a major attraction of eggs is that they are a ‘complete protein’. So how much of that can we afford?

Here I’ll outline a model of the optimal protein consumption bundle. What does this mean? This means consuming the quantities of protein sources that satisfy the recommended daily intake (RDI) of the essential amino acids and doing so at the lowest possible expenditure. Clearly, this post includes a mix of both nutrition and economics.  Since a comprehensive evaluation that includes all possible foods would be a heavy lift, here I’ll just outline the method with a small application.

Consider a list of prices for 100 grams of Beef, Eggs, and Pork.* We can also consider a list that identifies the quantity that we purchase in terms of hundreds of grams. Therefore, the product of the two yields the total that we spend on our proteins.

Of course, not all proteins are identical. We need some characteristics by which to compare beef, eggs, and pork. Here, I’ll use the grams of essential amino acids in 100 grams of each protein source. Because there are different RDIs for each amino acid, I express each amino acid content as a proportion of the RDI (represented by the standard molecular letter).

Then, we can describe how much of the RDI of each amino acid that a person consumes by multiplying the amino acid contents by the quantities of proteins consumed.

Our goal is to find the minimum expenditure, B, by varying the quantities consumed, Q, such that the minimum of C is equal to one. If the minimum element of C is greater than one, then a person could consume less and spend less while still satisfying their essential amino acid RDI. If the minimum element is less than one, then they aren’t getting the minimum RDI.

How do we find such a thing? Well, not algebraically, that’s for sure. I’ll use some linear programming (which is kind of like magic, there’s no process to show here).

The solution results in consuming only 116.28 grams of Pork and spending $1.093 per day. The optimal amino acid consumption is also below. Clearly, prices change. So, if eggs or beef became cheaper relative to pork, then we’d get different answers.

In fact, we have the price of these protein sources going back almost every month to 1998. While pork is exceptionally nutritious, it hasn’t always been most cost effective. Below are the prices for 1998-2025. See how the optimal consumption bundle has changed over time – after the jump.

Continue reading

The Price of Eggs: Long-Run Perspective

Everyone is talking about the price of eggs. Even the President. That’s despite the fact eggs, on average, constitute about 0.1% of consumer spending (according to the Consumer Expenditure Survey for 2023). Even so, economists always get excited when people talk about prices.

On prices at the current moment, I wrote a blog post for the Cato Institute looking at the relevant supply and demand factors, and trying to explain why wholesale egg prices are falling so quickly. When will these falling wholesale prices translate into lower retail prices? The NY Times asked this question, and I tried to answer it for them (answer: perhaps in a few weeks).

But let’s step back from the current moment and take a longer-term perspective on egg prices. This chart shows the long-run real price of eggs, measured in terms of how much time an average worker would need to work to afford 1 dozen eggs:

Continue reading

A Forgotten Data Goldmine: Foreign Commerce and Navigation Reports

Economists rely on trade data. The historical Foreign Commerce and Navigation of the United States reports detailed monthly figures on imports, exports, and re-exports. This dataset spans decades, providing a crucial resource for researchers studying price movements, consumption patterns, and the effects of war on global trade.

The U.S. Department of Commerce compiled these reports to track the nation’s commercial activity. The data cover a vast range of commodities, including coffee, sugar, wheat, cotton, wool, and petroleum. Officials recorded trade flows at a granular level, enabling economists to analyze seasonal fluctuations, wartime distortions, and postwar recoveries. Their inclusion of re-export figures allows for precise estimates of domestic consumption. Researchers who ignore re-exports risk overstating demand by treating imports as goods consumed rather than goods in transit.

Continue reading

Michigan Consumer Surveys: Individual-Response Data

I’ve now posted individual-level responses to the 1978-2025 Michigan Consumer Surveys to Kaggle in CSV and Stata formats. The University of Michigan’s Consumer Surveys are a widely followed source for data on consumer confidence and inflation expectations:

Their official site is good if you just want summary tables or charts like this:

But what if you want detailed crosstabs to see how sentiment differs for different groups, or microdata so that you can run regressions? With enough clicks you can get this from what UMich calls their “cross-section archive“. But it is pretty hidden, my student looking into this thought they just didn’t offer individual-level data; and even once you get their data, it is in an unlabelled CSV file with hard-to-understand variable names and codes. So I wanted to make it clear that the full data with all responses for all years is available, and if you use my Stata version it is even reasonably easy to understand (the code I adapted for labelling it is on OSF). Then you can run your regressions, or make charts like this:

The College-Only Covid Recovery

If you’re new here, a reminder that you can find other cleaned-up versions of popular datasets on my data page.

US Federal Government Spending Hasn’t Decreased (Yet)

Despite DOGE and the President partially stopping some payments for some federal agencies, the changes so far aren’t visible at all in federal payments data. The Brookings Institution has put together a new tool that tracks daily spending data from the US Treasury. (My co-blogger Zach wrote about this tool last week too.) Here’s a chart from that tool showing total federal outlays by calendar year. Notice that 2025 is right on track with the past two years, or just slightly above (dollars are in nominal terms):

Of course, given the massive amount of US federal spending and the large number of agencies, we might expect it to take more than a few months to get spending under control or significantly alter its course. But this way of tracking the data is definitely picking up any changes made so far. For example, notice the flat lining of USAID funding after Trump comes into office at the end of January:

So while we don’t see any big changes yet in the aggregate spending, the few small agencies that DOGE has frozen are showing up in this data. That tells this will be a useful tool to follow going forward.

Understanding the Projected GDP Decline

UPDATE: This thread on Twitter from the Atlanta Fed provides some clarification on how this model is behaving (it is probably overstating the decline due to gold inflow).

You may have seen the following chart recently:

The chart comes from the Atlanta Fed’s GDPNow model, which tries to estimate GDP growth each quarter as data becomes available. The sharp drops in their Q1 forecast for 2025, based on the last two data updates, look pretty shocking. Should we be worried?

First, it’s useful to ask: has this model been accurate recently? Yes, it has. For Q4 of 2025, the model forecast 2.27% growth — it was 2.25%. For Q3 of 2024, the model forecast 2.79% growth — it was 2.82%. Those are very accurate estimates. Of course, it’s not always right. It overestimated growth by 1 percentage point in Q1 of 2024, and it underestimated growth by 1 percentage point the quarter before that. So pretty good, but not perfect. Notable: during the massive decline in Q2 2020 at the start of the pandemic, it got pretty close even given the strange, uncertain data and times, predicting -32.08% when it was -32.90% (that’s off by almost 1 percentage point again, but given the highly unusual times, I would say “pretty good”).

OK, so what can we say about the current forecast of -2.8% for Q1 of 2025? First, almost all of the data in the model right now are for January 2025 only. We still have 2 full months in the quarter to go (in terms of data collection). Second, the biggest contributor to the negative reading is a massive increase in imports in January 2025.

To understand that part of the equation, you have to think about what GDP is measuring. It is trying to measure the total amount of production (or income) in the United States. One method of calculation is to add up total consumption in the US, including by final consumers, business investments, and government purchases and investments. But this method of calculation undercounts some US production (because exports don’t show up — they are consumed elsewhere) and overcounts some US production (because imports are consumed here, but not produced here). So to make GDP an accurate measure of domestic production, you need to add in exports, and subtract imports.

Keep in mind what we’re doing in this calculation: we aren’t saying “exports good, imports bad.” We are trying to accurately measure production, but in a roundabout way: by adding up consumption. So we need to take out the goods imported — not because they are bad, but because they aren’t produced in the US.

The Atlanta Fed GDPNow model is doing exactly that, subtracting imports. However, it’s likely they are doing it incorrectly. Those imports have to show up elsewhere in the GDP equation. They will either be current consumption, or added to business inventories (to be consumed in the future). My guess, without knowing the details of their model, is that it’s not picking up the change in either inventories or consumption that must result from the increased imports. It’s also just one month of data on imports.

As always, we’ll have to wait for more data and then, of course, the actual data from BEA (which won’t come until April 30th). More worrying in the current data, to me, is not the massive surge in imports — instead, it’s that real personal consumption expenditures and real private fixed investment are currently projected to be flat in Q1. If consumption growth is 0% in Q1, it will be a bad GDP report, regardless of everything else in the data.

What does the Department of Education even do?

If you follow libertarian media such as Reason Magazine or its ancillaries, then you are well acquainted with the humdrum of “it goes without saying that most US programs should be ended“. They kind of just say this and then continue with their news. One of the favorites is to say that we should get rid of the Department of Education (ED). After all, 90% of K-12 education is paid for by states and localities. Here I was thinking “what does the Department of Education even do”?

Agreement is different from trust. I trust the Brookings Institute. They have a nice explainer on what ED does. It’s a quick overview and has plenty of the appropriate citations. I learned that most of what ED does concerns K-12 and is achieved through grants that have strings attached. Funding primarily goes to serving “educationally disadvantaged” communities (that have a high poverty rate). Funding also goes to programs for disabled children, minority education programs (like Howard University), and Indian tribes. They also administer Pell Grants and fund & regulate college loans (which are privately administered).

ED’s appropriated budget is online for anyone to see and includes pretty good detail about costs. The total discretionary cost of FY 2024 was $79 billion. The “mandatory” spending, which does not need to be voted on by congress every year, was $45 billion. For context, the entire federal FY 2024 expenditure was $6.75 trillion. So, eliminating the department of education *and* it’s responsibilities (an unpopular position) would reduce federal expenditures by 1.8%. For even more context, the budget deficit is $1.83 trillion or 27.1% of total federal expenditures. Eliminating ED and consolidating its responsibilities to other departments would save $0.6 billion. That assumes eliminating program administration, the ED office of civil rights, and the ED office of the inspector general.

Continue reading