Skip to content
  • Blog
  • About

Economist Writing Every Day

State Demographics

State Demographics 1962-2024

July 31, 2025July 30, 2025James BaileyLeave a comment

I provide a simple, clean panel dataset of the historical demographics of US states here.

I made this state-year level dataset from the individual-level responses in the Current Population Survey‘s Annual Social and Economic Supplement. It includes key demographic variables commonly used as controls in regressions: age, race, sex, marital status, income, education, health insurance.

This was no great feat, in that I’m sure hundreds of researchers have generated very similar datasets before- but as far as I can tell, none of them share it publicly like this. I hope that posting this will save the time of researchers who would otherwise spend a couple hours re-doing the same work themselves, and offer easy-to-use Stata and Excel files for students who might not be able to make one themselves.

I’ve often seen my own students start an empirical project where they have state-level data on their dependent and independent variables of interest, then realize they need some control variables and find that they are surprisingly hard to get. Census’ website is difficult to navigate and mostly offers its data one year at a time. IPUMS is great for getting individual-level responses; they have a tool that can export state-level data directly, but it is not built or advertised in a way that students end up actually using it for that, and it is limited to exporting 8000 cells of data at a time (the dataset I share has 53,856). The National Welfare Data is what I have been recommending to students for state-level controls, but its year coverage is more limited (1980-2023), and it contains welfare variables that are mostly extraneous for this. County-level data is available back to 1969, which could be aggregated to the state level, but it doesn’t have income or education.

Disclaimers: prior to 1977, some states are missing or are combined with nearby states (the data makes clear which ones). Some variables change how they are coded by Census / IPUMS over time. Age in particular sees several big changes to its universe and its top-coding, and race gets recoded in 2003. Therefore, this is not always a good dataset for measuring national trends in a variable over time, but it should still work well for making comparisons across states in a given year. If you are using this data in a regression, I strongly recommend controlling for year fixed effects to mitigate this issue. If you want to double-check my cleaning code, it is available here.

Once again, the state demographic data is available here. If you think there is a better source out there for historical state demographics, let us know below. See my data page for more cleaned-up versions of public datasets.

DataAnnual Social and Economic Supplement, Current Population Survey, excel, Historical data, IPUMS, National Welfare Data, New Data, Stata, State Demographic Controls, State Demographics

Recent Posts

  • Rising Chinese Zombie Firms
  • Is This the End of the Largest Refugee Crisis in the Americas?
  • Liberal Democratic Institutions Generally Improve After US Military Intervention (Post-Cold War)
  • Is the Silver Bubble Bursting?
  • Where do we find papers to read?
  • Summary of You Wouldn’t Steal a Car

Categories

  • Art
  • Books
  • crime
  • Data
  • EconLit
  • Economics
  • Education
  • Film & TV
  • Food and Drink
  • Health
  • History
  • Investing
  • Music
  • News
  • Parenting
  • Politics
  • Productivity
  • Products
  • Religion
  • Sports
  • Teaching
  • Technology
  • Travel
  • Uncategorized
  • Weblogs

Archives

Authors

  • Darwyyn Deyo's avatar Darwyyn Deyo
    • The Research Process: Getting It Out the Door
    • The Research Process: It’s Us Against the Blinking Cursors of the World
  • James Bailey's avatar James Bailey
    • Is This the End of the Largest Refugee Crisis in the Americas?
    • How Good Were 2025 Forecasts?
  • Jeremy Horpedahl's avatar Jeremy Horpedahl
    • Liberal Democratic Institutions Generally Improve After US Military Intervention (Post-Cold War)
    • 2025 in Data
  • Joy Buchanan's avatar Joy Buchanan
    • Summary of You Wouldn’t Steal a Car
    • Understanding Vulnerability: What Anna Karenina Can Teach Us About Grooming and Loneliness
  • Scott Buchanan's avatar Scott Buchanan
    • Is the Silver Bubble Bursting?
    • Review of MUGFA (Aerogarden type) Countertop Hydroponic Units
  • mdmakowsky's avatar mdmakowsky
    • Where do we find papers to read?
    • Part II: Why agent-based modeling could happen in economics. Eventually.
  • nortonnole's avatar nortonnole
    • Thoughts on end-of-semester lectures (Part 1)
    • Rationality and economics
  • Santi's avatar Santi
    • Charter Cities and Genetic Algorithms
    • The statistically diverse curriculum
  • siyuwsu's avatar siyuwsu
    • Political Polarization and Social Distancing
  • Vincent Geloso's avatar Vincent Geloso
    • Public Goods and Spending on lighthouses in Antebellum America
    • Lomborg’s public choice problem
  • Zachary Bartsch's avatar Zachary Bartsch
    • Rising Chinese Zombie Firms
    • Tariffs Are Not Smart Industrial Policy
Blog at WordPress.com.
  • Subscribe Subscribed
    • Economist Writing Every Day
    • Join 996 other subscribers
    • Already have a WordPress.com account? Log in now.
    • Economist Writing Every Day
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar