Triumph of the Data Hoarders 2: The Institutions

Datasets can be pulled offline for all sorts of reasons. As I wrote in February, this shows the value of being a data hoarder– just downloading now any data you think you might want later:

Several major datasets produced by the federal government went offline this week…. This serves as a reminder of the value of redundancy- keeping datasets on multiple sites as well as in local storage. Because you never really know when one site will go down- whether due to ideological changes, mistakes, natural disasters, or key personnel moving on.

The US Federal government shutdown this month provides another reminder of this. So far most datasets are still up, but I’ve seen some availability issues:

The good news is that a number of institutions have stepped up in 2025 to host at-risk datasets (joining those like IPUMSNBER, and Archive.org that have been hosting datasets for many years, but are scaling up to meet the moment):

  • Restore CDC hosts all CDC data as it was in January 2025.
  • The Data Rescue Project provides tools and suggestions for how other institutions can save data at scale, plus links to other projects.