Triumph of the Data Hoarders 2: The Institutions

Datasets can be pulled offline for all sorts of reasons. As I wrote in February, this shows the value of being a data hoarder– just downloading now any data you think you might want later:

Several major datasets produced by the federal government went offline this week…. This serves as a reminder of the value of redundancy- keeping datasets on multiple sites as well as in local storage. Because you never really know when one site will go down- whether due to ideological changes, mistakes, natural disasters, or key personnel moving on.

The US Federal government shutdown this month provides another reminder of this. So far most datasets are still up, but I’ve seen some availability issues:

The good news is that a number of institutions have stepped up in 2025 to host at-risk datasets (joining those like IPUMSNBER, and Archive.org that have been hosting datasets for many years, but are scaling up to meet the moment):

  • Restore CDC hosts all CDC data as it was in January 2025.
  • The Data Rescue Project provides tools and suggestions for how other institutions can save data at scale, plus links to other projects.

National Survey of Children’s Health Backup

The NSCH is the latest casualty of the new administration taking down major datasets from government websites. Between Archive.org and what I had downloaded for old projects, I was able to get all the 2016-2023 topical NSCH files and post them on an Open Science Foundation page.

I took this as a chance to improve the data- the government previously only made the topical Public Use Files available in SAS and Stata formats one year at a time, so I added a merged version for all available years in both Stata and Excel formats.

I hope and expect that the National Survey Children’s Health will be back up at official websites soon. But I expect that other datasets will be taken down permanently, so now is the time to download what you think you might need and add it to your data hoard– especially if you want anything from the Department of Education.

Triumph of the Data Hoarders

Several major datasets produced by the federal government went offline this week. Some, like the Behavioral Risk Factor Surveillance Survey and the American Community Survey, are now back online; probably most others will soon join them. But some datasets that the current administration considers too DEI-inflected could stay down indefinitely.

This serves as a reminder of the value of redundancy- keeping datasets on multiple sites as well as in local storage. Because you never really know when one site will go down- whether due to ideological changes, mistakes, natural disasters, or key personnel moving on.

External hard drives are an affordable option for anyone who wants to build up their own local data hoard going forward. The Open Science Foundation site allows you to upload datasets up to 50 GB to share publicly; that’s how I’ve been sharing cleaned-up versions of the BRFSS, state-levle NSDUH, National Health Expenditure Accounts, Statistics of US Business, and more. If you have a dataset that isn’t online anywhere, or one that you’ve cleaned or improved to the point it is better than the versions currently online, I encourage you to post it on OSF.

If you are currently looking for a federal dataset that got taken down, some good places to check are IPUMS, NBER, Archive.org, or my data page. PolicyMap has posted some of the federal datasets that seem particularly likely to stay down; if you know of other pages hosting federal datasets that have been taken down, please share them in the comments.