DID Explainer and Application (STATA)

The Differences-in-Differences literature has blown up in the past several years. “Differences-in-Differences” refers to a statistical method that can be used to identify causal relationships (DID hereafter). If you’re interested in using the new methods in Stata, or just interested in what the big deal is, then this post is for you.

First, there’s the basic regression model where we have variables for time, treatment, and a variable that is the product of both. It looks like this:

The idea is that that there is that we can estimate the effect of time passing separately from the effect of the treatment. That allows us to ‘take out’ the effect of time’s passage and focus only on the effect of some treatment. Below is a common way of representing what’s going on in matrix form where the estimated y, yhat, is in each cell.

Each quadrant includes the estimated value for people who exist in each category.  For the moment, let’s assume a one-time wave of treatment intervention that is applied to a subsample. That means that there is no one who is treated in the initial period. If the treatment was assigned randomly, then β=0 and we can simply use the differences between the two groups at time=1.  But even if β≠0, then that difference between the treated and untreated groups at time=1 includes both the estimated effect of the treatment intervention and the effect of having already been treated prior to the intervention. In order to find the effect of the intervention, we need to take the 2nd difference. δ is the effect of the intervention. That’s what we want to know. We have δ and can start enacting policy and prescribing behavioral changes.

Easy Peasy Lemon Squeezy. Except… What if the treatment timing is different and those different treatment cohorts have different treatment effects (heterogeneous effects)?*  What if the treatment effects change over time the longer an individual is treated (dynamic effects)**?  Further, what if the there are non-parallel pre-existing time trends between the treated and untreated groups (non-parallel trends)?*** Are there design changes that allow us to estimate effects even if there are different time trends?**** There’re more problems, but these are enough for more than one blog post.

For the moment, I’ll focus on just the problem of non-parallel time trends.

What if untreated and the to-be-treated had different pre-treatment trends? Then, using the above design, the estimated δ doesn’t just measure the effect of the treatment intervention, it also detects the effect of the different time trend. In other words, if the treated group outcomes were already on a non-parallel trajectory with the untreated group, then it’s possible that the estimated δ is not at all the causal effect of the treatment, and that it’s partially or entirely detecting the different pre-existing trajectory.

Below are 3 figures. The first two show the causal interpretation of δ in which β=0 and β≠0. The 3rd illustrates how our estimated value of δ fails to be causal if there are non-parallel time trends between the treated and untreated groups. For ease, I’ve made β=0  in the 3rd graph (though it need not be – the graph is just messier). Note that the trends are not parallel and that the true δ differs from the estimated delta. Also important is that the direction of the bias is unknown without knowing the time trend for the treated group. It’s possible for the estimated δ to be positive or negative or zero, regardless of the true delta. This makes knowing the time trends really important.

STATA Implementation

If you’re worried about the problems that I mention above the short answer is that you want to install csdid2. This is the updated version of csdid & drdid. These allow us to address the first 3 asterisked threats to research design that I noted above (and more!). You can install these by running the below code:

program fra
    syntax anything, [all replace force]
    local from "https://friosavila.github.io/stpackages"
    tokenize `anything'
    if "`1'`2'"==""  net from `from'
    else if !inlist("`1'","describe", "install", "get") {
        display as error "`1' invalid subcommand"
    }
    else {
        net `1' `2', `all' `replace' from(`from')
    }
    qui:net from http://www.stata.com/
end
fra install fra, replace
fra install csdid2
ssc install coefplot

Once you have the methods installed, let’s examine an example by using the below code for a data set. The particulars of what we’re measuring aren’t important. I just want to get you started with the an application of the method.

local mixtape https://raw.githubusercontent.com/Mixtape-Sessions
use `mixtape'/Advanced-DID/main/Exercises/Data/ehec_data.dta, clear
qui sum year, meanonly
replace yexp2 = cond(mi(yexp2), r(max) + 1, yexp2)

The csdid2 command is nice. You can use it to create an event study where stfips is the individual identifier, year is the time variable, and yexp2 denotes the times of treatment (the treatment cohorts).

csdid2 dins, time(year) ivar(stfips) gvar(yexp2) long2 notyet
estat event,  estore(csdid) plot
estimates restore csdid

The above output shows us many things, but I’ll address only a few of them. It shows us how treated individuals differ from not-yet treated individuals relative to the time just before the initial treatment. In the above table, we can see that the pre-treatment average effect is not statistically different from zero. We fail to reject the hypothesis that the treatment group pre-treatment average was identical to the not-yet treated average at the same time period. Hurrah! That’s good evidence for a significant effect of our treatment intervention. But… Those 8 preceding periods are all negative. That’s a little concerning. We can test the joint significance of those periods:

estat event, revent(-8/-1)

Uh oh. That small p-value means that the level of the 8 pretreatment periods significantly deviate from zero. Further, if you squint just a little, the coefficients appear to have a positive slope such that the post-treatment values would have been positive even without the treatment if the trend had continued. So, what now?

Wouldn’t it be cool if we knew the alternative scenario in which the treated individuals had not been treated? That’s the standard against which we’d test the observed post-treatment effects. Alas, we can’t see what didn’t happen. BUT, asserting some premises makes the job easier. Let’s say that the pre-treatment trend, whatever it is, would have continued had the treatment not been applied. That’s where the honestdid stata package comes in. Here’s the installation code:

local github https://raw.githubusercontent.com
net install honestdid, from(`github'/mcaceresb/stata-honestdid/main) replace
honestdid _plugin_check

What does this package do? It does exactly what we need. It assumes that the pre-treatment trend of the prior 8 periods continues, and then tests whether one or more post-treatment coefficients deviate from that trend. Further, as a matter of robustness, the trend that acts as the standard for comparison is allowed to deviate from the pre-treatment trend by a multiple, M, of the maximum pretreatment deviations from trend. If that’s kind of wonky – just imagine a cone that continues from the pre-treatment trend that plots the null hypotheses. Larger M’s imply larger cones. Let’s test to see whether the time-zero effect significantly differs from zero.

estimates restore csdid
matrix l_vec=1\0\0\0\0\0
local plotopts xtitle(Mbar) ytitle(95% Robust CI)
honestdid, pre(5/12) post(13/18) mvec(0(0.5)2) coefplot name(csdid2lvec,replace) l_vec(l_vec)

What does the above table tell us? It gives us several values of M and the confidence interval for the difference between the coefficient and the trend at the 95% level of confidence. The first CI is the original time-0 coefficient. When M is zero, then the null assumes the same linear trend as during the pretreatment. Again, M is the ratio by which maximum deviations from the trend during the pretreatment are used as the null hypothesis during the post-treatment period.  So, above, we can see that the initial treatment effect deviates from the linear pretreatment trend. However, if our standard is the maximum deviation from trend that existed prior to the treatment, then we find that the alpha is just barely greater than 0.05 (because the CI just barely includes zero).

That’s the process. Of course, robustness checks are necessary and there are plenty of margins for kicking the tires. One can vary the pre-treatment periods which determine the pre-trend, which post-treatment coefficient(s) to test, and the value of M that should be the standard for inference. The creators of the honestdid seem to like the standard of identifying the minimum M at which the coefficient fails to be significant. I suspect that further updates to the program will come along that spits that specific number out by default.

I’ve left a lot out of the DID discussion and why it’s such a big deal. But I wanted to share some of what I’ve learned recently with an easy-to-implement example. Do you have questions, comments, or suggestions? Please let me know in the comments below.


The above code and description is heavily based on the original author’s support documentation and my own Statalist post. You can read more at the above links and the below references.

*Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics, Themed Issue: Treatment Effect 1, 225 (2): 175–99. https://doi.org/10.1016/j.jeconom.2020.09.006.

**Sant’Anna, Pedro H. C., and Jun Zhao. 2020. “Doubly Robust Difference-in-Differences Estimators.” Journal of Econometrics 219 (1): 101–22. https://doi.org/10.1016/j.jeconom.2020.06.003.

***Callaway, Brantly, and Pedro H. C. Santa Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics, Themed Issue: Treatment Effect 1, 225 (2): 200–230. https://doi.org/10.1016/j.jeconom.2020.12.001.

****Rambachan, Ashesh, and Jonathan Roth. 2023. “A More Credible Approach to Parallel Trends.” The Review of Economic Studies 90 (5): 2555–91. https://doi.org/10.1093/restud/rdad018.

Robert Solow on Sustainability

2023 continues to be a dangerous year for eminent economists. We have once again lost a Nobel laureate who was influential even by the standard of Nobelists, Robert Solow:

I’m sure you will soon see many tributes that discuss his namesake Solow Model (MR already has one), or discuss him as a person. I never got to meet him (just saw him give a talk) and the Solow Model is well known, so I thought I’d take this occasion to discuss one of his lesser-known papers- “Sustainability: An Economists Perspective“. What follows comes from my 2009 reaction to his paper:

Continue reading

How the Economy is Doing vs. How People Think the Economy is Doing

Lately many journalists and folks on X/Twitter have pointed out a seeming disconnect: by almost any normal indicator, the US economy is doing just fine (possibly good or great). But Americans still seem dissatisfied with the economy. I wanted to put all the data showing this disconnect into one post.

In particular, let’s make a comparison between November 2019 and November 2023 economic data (in some cases 2019q3 and 2023q3) to see how much things have changed. Or haven’t changed. For many indicators, it’s remarkable how similar things are to probably the last month before anyone most normal people ever heard the word “coronavirus.”

First, let’s start with “how people think the economy is doing.” Here’s two surveys that go back far enough:

The University of Michigan survey of Consumer Sentiment is a very long running survey, going back to the 1950s. In November 2019 it was at roughly the highest it had ever been, with the exception of the late 1990s. The reading for 2023 is much, much lower. A reading close to 60 is something you almost never see outside of recessions.

The Civiqs survey doesn’t go back as far as the Michigan survey, but it does provide very detailed, real-time assessments of what Americans are thinking about the economy. And they think it’s much worse than November 2019. More Americans rate the economy as “very bad” (about 40%) than the sum of “fairly good” and “very good” (33%). The two surveys are very much in alignment, and others show the same thing.

But what about the economic data?

Continue reading

Sorry, you caught me between critical masses

I’m on Bluesky. I’m on twitter/X. I’m not happy with either right now. I wasn’t particularly happy on twitter before, but that was before it became much worse, so now I wish it could back to the way it was, when I was also complaining, because it turns out the counterfactual universe where it was different is actually worse. So here we are.

The decline in my personal portfolio of social media largely comes down to critical mass. The decline in Twitter usage has reduced its value to me (and most of its users). Only a tiny fraction of this loss in Twitter value is offset by the value I receive from Bluesky for the simple reason it doesn’t have enough users. Even if 100% of Twitter exits had led to Bluesky entrants, it would still be a value loss because the marginal user currently offers more value at Twitter. Standard network goods, returns to scale, power law mechanics, yada yada yada.

Now, to be clear, Twitter is still well above the minimum critical mass threshold for significant value-add, but the good itself has also been damaged by Elon’s managerial buffoonery. Bluesky, depending on your point of view and consumer niche, hasn’t achieved a self-sustaining critical mass (e.g. econsky hasn’t quite cracked it, unfortunately). The result is a decent number of people half-committing to both, which only serves to undermine consumer value being generated in the entire “microblogging” social media space.

The problem, simply put, is that Twitter still has too much option value to leave entirely. If (when) Elon get’s the mother-of-all-margin-calls, he’ll likely have to sell Twitter or large amount of his Tesla holdings. If he’s smart and doesn’t cave in to the sunk cost fallacy (a non-trivial “if”), he’ll sell Twitter. If new ownership successfully returns Twitter to suitable fascimile of it’s previous form, people will come flooding back, Bluesky will turn-off or wholly adapt into a new consumer paradigm, and everyone will be thrilled to have squatted on their previous accounts.

If Twitter retains its current form, then it will probably die, though not at the direct hands of Bluesky. More likely it will be displaced by some new product most of us don’t yet see coming, as the next generation departs twitter the way millenials departed Facebook for Snapchat and, eventually, TikTok. Perhaps counterintuitively, this outcome is actually excellent for Bluesky, because the absence of twitter will send the 3% of “professional” twitter users (economists, journalists, thinktank wonks, policy makers, etc) to Bluesky, where they will achieve niche critical mass and live happily ever after (at least as happy as one can be whilst immersed in a sea of status-obsessed try-hards).

But for the moment, we’re all a little stuck trying to make do with finding fulfillment in the complex personal lives, loving families, transcendant art, and multidimensional experiences that remain confined to meatspace. We can only do our best and remain strong during such trying times.

Chapman Economists Revise Forecast

Back in June, I watched the livestream of the Chapman Economic Forecast with Dr. Jim Doti (who was president when I was a student at Chapman). Typically, this is a valuable informative event, and the team has an excellent record of performance. They have often outdone other forecasters in predicting the future.

That is why I feel a little bad for making this post in the summer and tweeting out Doti’s prediction that we would have a recession by now.

To be fair to Doti, there has been a lot of uproar over this issue. Lots of people thought the economy would be bad. And lots of people feel like the economy is bad (the “vibecession”) even though it is objectively not. Many tweets have gone by about it.

Doti opened by saying his prediction had turned out to be wrong. He had an explanation for it (pictured below). You can watch it free here (recorded on Dec 14).

Doti said that he had expected a large fiscal stimulus in the form of deficit spending, however he had not expected the deficit to be so large. Debt-financed spending propped up an economy that was otherwise poised to contract. At least, that is a plausible story.

Looking forward, Doti does not predict a recession next year, but he does predict weak growth and possibly one quarter of GPD decline (not two).

The next part of talk was about the long-term consequences of deficit spending. Nothing is free. TANSTAAFL

In addition to vibecession, anyone following economics in 2023 needs to know what a “soft landing” is.

Update on Game Theory Teaching

I wrote at the end of the summer about some changes that I would make to my Game Theory course. You can go back and read the post. Here, I’m going to evaluate the effectiveness of the changes.

First, some history.

I’ve taught GT a total of 5 time. Below are my average student course evaluations for “I would recommend this class to others” and “I would consider this instructor excellent”. Although the general trend has been improvement, improving ratings and the course along the way, some more context would be helpful. In 2019, my expectations for math were too high. Shame on me. It was also my first time teaching GT, so I had a shaky start. In 2020, I smoothed out a lot of the wrinkles, but I hadn’t yet made it a great class. 

In 2021, I had a stellar crop of students. There was not a single student who failed to learn. The class dynamic was perfect and I administered the course even more smoothly. They were comfortable with one another, and we applied the ideas openly. In 2022, things went south. There were too many students enrolled in the section, too many students who weren’t prepared for the course, and too many students who skated by without learning the content. Finally, in 2023, the year of my changes, I had a small class with a nice symmetrical set of student abilities.  

Historically, I would often advertise this class, but after the disappointing 2022 performance, and given that I knew that I would be making changes, I didn’t advertise for the 2023 section. That part worked out perfectly. Clearly, there is a lot of random stuff that happens that I can’t control. But, my job is to get students to learn, help the capable students to excel, and to not make students *too* miserable in the process – no matter who is sitting in front of me.

Continue reading

National Health Expenditure Accounts Historical State Data: Cleaned, Merged, Inflation Adjusted

The government continues to be great at collecting data but not so good at sharing it in easy-to-use ways. That’s why I’ve been on a quest to highlight when independent researchers clean up government datasets and make them easier to use, and to clean up such datasets myself when I see no one else doing it; see previous posts on State Life Expectancy Data and the Behavioral Risk Factor Surveillance System.

Today I want to share an improved version of the National Health Expenditure Accounts Historical State Data.

National Health Expenditure Accounts Historical State Data: The original data from the Centers for Medicare and Medicaid Services on health spending by state and type of provider are actually pretty good as government datasets go: they offer all years (1980-2020) together in a reasonable format (CSV). But it comes in separate files for overall spending, Medicare spending, and Medicaid spending; I merge the variables from all 3 into a single file, transform it from a “wide format” to a “long format” that is easier to analyze in Stata, and in the “enhanced” version I offer inflation-adjusted versions of all spending variables. Excel and Stata versions of these files, together with the code I used to generate them, are here.

A warning to everyone using the data, since it messed me up for a while: in the documentation provided by CMMS, Table 3 provides incorrect codes for most variables. I emailed them about this but who knows when it will get fixed. My version of the data should be correct now, but please let me know if you find otherwise. You can find several other improved datasets, from myself and others, on my data page.

State Tax Revenue is Down a Lot in 2023 (but really just back to normal levels)

State tax revenue is down a lot since last year. The latest comparable data from Census’s QTAX survey is for the 2nd quarter of 2023, and it shows a massive hit: state tax revenue was down 14% from the same quarter in 2022, which is about $66 billion. Almost all of that decline is from income tax revenue, specifically individual income tax revenue which is down over 30% (almost $60 billion). General sales taxes, the other workhorse of state budgets, is essentially flat over the year.

That’s a huge revenue decline! So, what’s going on? In some states, there has been an attempt to blame recent tax cuts. It’s not a bad place to start, since half of US states have reduced income taxes in the past 3 years, mostly reducing top marginal tax rates. But that can’t be the full explanation, since almost every state saw a reduction in revenue: just 3 states had individual income tax revenue increases (Louisiana, Mississippi, and New Hampshire) from 2022q2 to 2023q2, and they were among the half of states that reduced rates!

To get some perspective let’s look at long-run trends. This chart shows total state individual income tax revenue for all 50 states (sorry, DC) going back to 1993. I use a 4-quarter total, since tax receipts are seasonal (and because states sometimes move tax deadlines due to things like disasters, a specific quarter can sometimes look weird). And importantly, this data is not inflation adjusted. Don’t worry, I will do an adjustment further below in this post, but for starters let’s just look at the nominal dollars, because nominal dollars are how states receive money!

Continue reading

Former Treasury Official Defends Decision to Issue Short Term Debt for Pandemic;  I’m Not Buying It

We noted earlier (see “The Biggest Blunder in The History of The Treasury”: Yellen’s Failure to Issue Longer-Term Treasury Debt When Rates Were Low ), along with many other observers, that it seemed like a mistake for the Treasure to have issued lots of short-term (e.g. 1-2 year) bonds to finance the sudden multi-trillion dollar budget deficit from the pandemic-related spending surge in 2020-2021. Rates were near-zero (thanks to the almighty Fed) back then.

Now, driven by that spending surge, inflation has also surged, and thus the Fed has been obliged to raise interest rates. And so now, in addition to the enormous current deficit spending,  that tsunami of short-term debt from 2020-2021 is coming due, to be refinanced at much higher rates. This high interest expense will contribute further to the growing government debt.

Hedge fund manager Stanley Druckenmiller  commented in an interview:

When rates were practically zero, every Tom, Dick and Harry in the U.S. refinanced their mortgage… corporations extended [their debt],” he said. “Unfortunately, we had one entity that did not: the U.S. Treasury….

Janet Yellen, I guess because political myopia or whatever, was issuing 2-years at 15 basis points[0.15%]   when she could have issued 10-years at 70 basis points [0.70 %] or 30-years at 180 basis points [1.80%],” he said. “I literally think if you go back to Alexander Hamilton, it is the biggest blunder in the history of the Treasury. I have no idea why she has not been called out on this. She has no right to still be in that job.

Unsurprisingly, Yellen pushed back on this charge (unconvincingly). More recently, former Treasury official Amar Reganti has issued a more detailed defense. Here are some excerpts of his points:

( 1 ) …The Treasury’s functions are intimately tied to the dollar’s role as a reserve currency. It is simply not possible to have a reserve currency without a massive supply of short-duration fixed income securities that carry no credit risk.

( 2 ) …For the Treasury to transition the bulk of its issuance primarily to the long end of the yield curve would be self-defeating since it would most likely destabilise fixed income markets. Why? The demand for long end duration simply does not amount to trillions of dollars each year. This is a key reason why the Treasury decided not to issue ultralong bonds at the 50-year or 100-year maturities. Simply put, it did not expect deep continued investor demand at these points on the curve.

( 3 ) …The Treasury has well over $23tn of marketable debt. Typically, in a given year, anywhere from 28% to 40% of that debt comes due…so as not to disturb broader market functioning, it would take the Treasury years to noticeably shift its weighted average maturity even longer.

( 4 ) …The Treasury does not face rollover risk like private sector issuers.

Here is my reaction:

What Reganti says would be generally valid if the trillions of excess T-bond issuance in 2020-2021 were sold into the general public credit market. In that case, yes, it would have been bad to overwhelm the market with more long-term bonds than were desired.  But that is simply not what happened. It was the Fed that vacuumed up nearly all those Treasuries, not the markets. The markets were desperate for cash, and hence the Fed was madly buying any and every kind of fixed income security, public and corporate and mortgage (even junk bonds that probably violated the Fed’s bylaws), and exchanging them mainly for cash.  Sure, the markets wanted some short-term Treasuries as liquid, safe collateral, but again, most of what the Treasury issued ended up housed in the Fed’s digital vaults.

So, I remain unconvinced that the issuance of mainly long-term (say 10-year and some 30-year; no need to muddy the waters like Reganti did with harping on 50–100-year bonds) debt would have been a problem. So much fixed-income debt was vomited forth from the Treasury that even making a minor portion of it short-term would, I believe, have satisfied market needs. The Fed could have concentrated on buying and holding the longer-term bonds, and rolling them over eventually as needed, without disturbing the markets. That would have bought the country a decade or so of respite before the real interest rate effects of the pandemic debt issuance began to bite.

But nobody asked my opinion at the time.

Godzilla Minus One is fantastic

Did you know you could make a Godzilla movie, maybe the best one at that, for $15 million dollars (or 3 minutes of Chris Pratt in “End Game”, if you’d prefer numeraire)? This film, in which Godzilla is basically the demon baby of Jason Voorhees and the shark from Jaws, deftly explores concepts of guilt, shame, redemption, forgiveness, and family. I cried at the end. I repeat, I cried at the end of a Godzilla movie.

In the last month I’ve watched a Godzilla movie that is specifically constructed to recreate the feeling of a 1950s monster movie, a flawed but admirable attempt to make a modern Charlie Chaplin movie (Fool’s Paradise), and a watchable if uneven and wholly debauched variation on “Singin’ in the Rain” (Babylon). I don’t think this is a coincidence. I think this is a response to VFX and super hero (not comic book) movie fatigue. One way to do that is to go backwards, not in subject matter or setting necessarily, but in story composition and construct. The performances in all three films felt more stage than screen. Texture was emphasized over shock and awe. Emotional crescendos felt more earned than manipulated. I’m not saying these three films are perfect or even necessarily good. What I’m saying is that they felt like a return to older form of film as a medium.

For the last 15 years we’ve had a lot of “remakes” that attempted to modernize old films. Don’t be surprised if we see the inverse going forward: new, original stories filmed in a manner that feels older. “The Thing” but it’s a sea alien on an oil platform, everything wet and on fire. “All the Presidents Men” but it’s a coverup in local Iowa government, with scratchy sunken sofas and life-changing smoke breaks. “Working Girl” but it’s Zendaya and Scarlett Johannsen in a fully modern context, where a misread text subverts an expected plot turn on a broken iPhone screen. Not for a love of classic cinema mind you, or even art, but because making 10 to 1 on winners and losing next to nothing on flops is a business proposition more than a few studios are likely to find enticing.

Go see “Godzilla Minus One”.