The Credibility Revolution: A Nobel for Taking (some of) the CON out of Econometrics

Yesterday Jeremy pointed out that while the 2021 economics Nobelists have reached various conclusions in their study of labor economics, the prize was really awarded to the methods they developed and used.

I find the best explanation of the value of these methods to be this 2010 article by Angrist and Pischke in the Journal of Economic Perspectives: The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics

Like Jeremy, they think that empirical economic research (that is, research using econometrics) was most quite bad up to the 1980’s; as Ed Leamer put it in his paper “Let’s take the CON out of Econometrics”:

This is a sad and decidedly unscientific state of affairs we find ourselves in. Hardly anyone takes data analyses seriously. Or perhaps more accurately, hardly anyone takes anyone else’s data analyses seriously.

Angrist and Pischke argue that the field is in much better shape today:

empirical researchers in economics have increasingly looked to the ideal of a randomized experiment to justify causal inference. In applied micro fields such as development, education, environmental economics, health, labor, and public finance, researchers seek real experiments where feasible, and useful natural experiments if real experiments seem (at least for a time) infeasible. In either case, a hallmark of contemporary applied microeconometrics is a conceptual framework that highlights specific sources of variation. These studies can be said to be design based in that they give the research design underlying any sort of study the attention it would command in a real experiment.

The econometric methods that feature most prominently in quasi-experimental studies are instrumental variables, regression discontinuity methods, and differences-in-differences-style policy analysis

Our field still has big problems: the replication crisis looms, and the credibility revolution’s focus on the experimental ideal leads economists to avoid important questions that can’t be answered by natural experiments. But I do think that the average empirical economics paper today is much more credible than one from 1980, and that the 3 Nobelists are part of the reason why, so cheers to them.

Calling Behavioral Economics a Fad

Josh Hendrickson and Brian Albrecht have a Substack called Economic Forces that is a source of economics news and examples. We have linked to EF before at EWED.

Albrecht just published an op-ed titled “Behavioral Economics Is Fine. Just Keep It Away from Our Kids”. I’ll to respond to this, just as I responded to that other blog. I think the group of people who are pitting themselves against “behavioral economics” is small. They might even think of themselves as a minority embattled against the mainstream. So, why bother responding? That’s what blogs are good for.

I agree with Albrecht’s main point. The first thing an undergraduate should learn in economics classes is the classic theory of supply and demand. Even in its simplest form, the idea that demand curves slope down and supply curves slope up is powerful and important.*

Albrecht points out that there are some results that have been published in the behavioral economics literature that turned out not to replicate or, in the recent case of Dan Ariely, might be fraudulent. Then he makes a jump from there by calling the behavioral field of inquiry a “fad”. That’s not accurate. (See Scott Alexander on Ariely and related complaints.)

In his op-ed, Albrecht names the asset bubble as a faddish behavioral idea. Vernon Smith (with Suchanek and Williams) published “Bubbles, Crashes and Endogenous Expectations in Experimental Spot Asset Markets” in Econometrica in 1988. Bubbles have been replicated all around the world many times.  There is no doubt in anyone’s mind that the “dot com” bubble had an element of speculation that became irrational at a certain point. This is not a niche topic or a very rare occurrence. Bubbles are observed in the lab and out in the naturally occurring economy.

Should we start undergrads on bubbles before explaining the normal function of capital markets? No. Lots of people think that stock markets generally work well, communicate reliable information, and should be allowed to function with minimal regulation.  Behavioral Finance is usually right where it should be in the college curriculum, which is to be offered as an upper-division elective class for finance and economics majors. I am not going to do research on this, but I looked up courses at Cornell, and there it is: Behavioral Economics is one of many advanced elective classes offered for economics students. I don’t know how they teach ECON101 at Cornell, but it would seem like they are binning most of the behavioral content into later optional courses.

In a social media exchange, Albrecht pointed me to one of the posts by Hendrickson on how they handle the situations where it seems like economic forces are not explaining everything. Currently, for example, it seems like the labor market is not clearing right now because firms want to hire but wages are not rising. The quantity supplied seems lower than the quantity demanded at the market wage. Hendrickson claims that this market condition is temporary. He says that firms are cleverly paying bonuses to attract workers so that they won’t have to lower wages in the future when conditions return to normal post-Covid. This would be a perfect time to discuss downward nominal wage rigidity, a pervasive behavioral phenomenon.** It has been studied extensively in lab settings. Nominal wage rigidity has implications for monetary policy. Wage rigidity might be a “temporary” thing, but it helps to explain unemployment. Some of the research done by behavioral economists in this area follow the Akerlof 1982 paper on the gift exchange model. It was published 40 years ago by a Nobel prize winner and cited extensively.*** The seminal lab study of that theory is Fehr et al. 1993. There have been hundreds of replications of the main result that people will trade out of equilibrium due to positive reciprocity.

Continue reading

Weigh costs, benefits, and evidence quality

Living means making decisions with imperfect information. But Covid provides many examples of how people and institutions are often still bad at this. A few common errors:

  1. Imperfect evidence = perfect evidence. “Studies show Asprin prevents Covid”. OK, were the studies any good? Did any other studies find otherwise?
  2. Imperfect evidence = “no evidence” or “evidence against”. In early 2020, major institutions like the WHO said “masks don’t work” when they meant “there are no large randomized controlled trials on the effectiveness of masks”
  3. Imperfect evidence = don’t do it until you’re sure Inaction is a choice, and often a bad one. If the costs of action are low and the potential benefits of action high, you might want to do it anyway. Think masks in 2020 when the evidence for them was mediocre, or perhaps Vitamin D now.
  4. Imperfect evidence = do it, we have to do something Even in a pandemic, it is possible to over-react if the costs are high enough and/or the evidence of benefits bad enough (possibly lockdowns, definitely taking up smoking)

Any intro microeconomics class will explain the importance of weighing both costs and benefits. But how do we know what the costs and benefits are? For many everyday purchases they are usually obvious, but in other situations like medical treatments and public policies they aren’t, particularly the benefits. We have to estimate the benefits using evidence of varying quality. This creates more dimensions of tradeoffs- do you choose something with good evidence for its benefits, but high cost? Or something with worse evidence but lower costs? Graphing this properly should take at least 3 dimensions, but to keep things simple lets assume we know what the costs are, and combine benefits and evidence into a single axis called “good evidence of substantial benefit”. This yields a graph like:

Applied to Covid strategies, this yields a graph something like this:

This is not medical advice- I say this not merely as a legal disclaimer, but because my real point is the idea that we should weigh both evidence quality and costs, NOT that my estimates of the evidence quality or costs of particular strategies are better than yours

Judging the strength of the evidence for various strategies is inherently difficult, and might go beyond simply evaluating the strength of published research. But when evaluating empirical studies on Covid, my general outlook on the evidence is:

Of course, details matter, theory matters, the number of studies and how mixed their results are matters, potential fraud and bias matters, and there’s a lot it makes sense to do without seeing an academic study on it.

Dear reader, perhaps this is all obvious to you, and indeed the idea of adjusting your evidence threshold based on the cost of an intervention goes back at least to the beginnings of modern statistics in deciding how to brew Guinness. But common sense isn’t always so common, and this is my attempt to summarize it in a few pictures.

Clemens and Strain on Large and Small Minimum Wage Changes

In my Labor Economics class, I do a lecture on empirical work and the minimum wage, starting with Card & Kreuger (1993). I’m going to quickly tack on the new working paper by Clemens & Strain “The Heterogeneous Effects of Large and Small Minimum Wage Changes: Evidence over the Short and Medium Run Using a Pre-Analysis Plan”.

The results, as summarized in the second half of their abstract are:

relatively large minimum wage increases reduced employment rates among low-skilled individuals by just over 2.5 percentage points. Our estimates of the effects of relatively small minimum wage increases vary across data sets and specifications but are, on average, both economically and statistically indistinguishable from zero. We estimate that medium-run effects exceed short-run effects and that the elasticity of employment with respect to the minimum wage is substantially more negative for large minimum wage increases than for small increases.

The variation in the data comes from choices by states to raise the minimum wage.

A number of states legislated and began to enact minimum wage changes that varied substantially in their magnitude. … The past decade thus provided a suitable opportunity to study the medium-run effects of both moderate minimum wage changes and historically large minimum wage changes.

We divide states into four groups designed to track several plausibly relevant differences in their minimum wage regimes. The first group consists of states that enacted no minimum wage changes between January 2013 and the later years of our sample. The second group consists of states that enacted minimum wage changes due to prior legislation that calls for indexing the minimum wage for inflation. The third and fourth groups consist of states that have enacted minimum wage changes through relatively recent legislation. We divide the latter set of states into two groups based on the size of their minimum wage changes and based on how early in our sample they passed the underlying legislation.

The “large” increase group includes states that enacted considerable change. New York and California “have legislated pathways to a $15 minimum wage, the full increase to which firms are responding exceed 60 log points in total.” Data comes from the American Community Survey (ACS) and the Current Population Survey (CPS).

Continue reading

Behavioral Economist at Work

A blog post titled “The Death of Behavioral Economics” went viral this summer. The clickbait headline was widely shared. After Scott Alexander debunked it point-by-point on Astral Codex Ten, no one corrected their previous tweets. I recommend Scott’s blog for the technical stuff. For example, there is an important distinction between saying that loss aversion does not exist versus saying that its underlying cause is the Endowment Effect.

The author of the original death post, Hreha, is angry. Here’s how he describes his experience with behavioral economics.

I’ve run studies looking at its impact in the real world—especially in marketing campaigns.

If you read anything about this body of research, you’ll get the idea that losses are such powerful motivators that they’ll turn otherwise uninterested customers into enthusiastic purchasers.

The truth of the matter is that losses and benefits are equally effective in driving conversion. In fact, in many circumstances, losses are actually *worse* at driving results.

Why?

Because loss-focused messaging often comes across as gimmicky and spammy. It makes you, the advertiser, look desperate. It makes you seem untrustworthy, and trust is the foundation of sales, conversion, and retention.

He’s trying to sell things. I wade through ads every day and, to mix metaphors, beat them off like mosquitoes. Knowing how I feel about sales pitches, I don’t envy Hreha’s position.

I don’t know Hreha. From reading his blog post, I get the impression that he believes he was promised certain big returns by economists. He tried some interventions in a business setting and did not get his desired results or did not make as much money as he was expecting.

According to him, he seeks to turn people into “enthusiastic purchasers” by exploiting loss aversion. What would consumers be losing, if you are trying to sell them something new? I’m not in marketing research so I should probably just not try to comment on those specifics. Now, Hreha claims that all behavioral studies are misleading or useless.

The failure to replicate some results is a big deal, for economics and for psychology. I have seen changes within the experimental community and standards have gotten tougher as a result. If scientists knowingly lied about their results or exaggerated their effect sizes, then they have seriously hurt people like Hreha and me. I am angry at a particular pair of researchers who I will not name. I read their paper and designed an extension of it as a graduate student. I put months of my life into this project and risked a good amount of my meager research budget. It didn’t work for me. I thought I knew what was going to happen in the lab, but I was wrong. Those authors should have written a disclaimer into their paper, as follows:

Disclaimer: Remember, most things don’t work.

I didn’t conclude that all of behavioral research is misleading and that all future studies are pointless. I refined my design by getting rid of what those folks had used and eventually I did get a meaningful paper written and published. This process of iteration is a big part of the practice of science.

The fact that you can’t predict what will happen in a controlled setting seems like a bad reason to abandon behavioral economics. It all got started because theories were put to the test and they failed. We can’t just retreat and say that theories shouldn’t get tested anymore.

I remember meeting a professor at a conference who told me that he doesn’t believe in experimental economics. He had tried an experiment once and it hadn’t turned out the way he wanted. He tried once. His failure to predict what happened should have piqued his curiosity!

There is a difference between behavioral economics and experimental economics. I recommend Vernon Smith’s whole book on that topic, which I quoted from yesterday, for those interested.

The reason we run experiments is that you don’t know what will happen until you try. The good justification for shutting down behavioral studies is if we get so good at predicting what interventions will work that the new data ceases to be informative.

Or, what if you think nudges are not working because people are highly sensible and rational? That would also imply that we can predict what they are going to do, at least in simple situations. So, again, the fact that we are not good at predicting what people are going to do is not a reason to stop the studies.

I posted last week about how economists use the word “behavioral” in conversation. Yesterday, I shared a stinging critique of the behavioral scientist community written by the world’s leading experimental researcher long before the clickbait blog.

Today, I will share a behavioral economics success story. There are lots of papers I could point to. I’m going to use one of my own, so that readers could truly ask me anything it. My paper is called “My reference point, not yours”.

I started with a prediction based on previous behavioral literature. My design depended on the fact that in the first stage of the experiment, people would not maximize expected value. You never know until you run the experiment, but I was pretty confident that the behavioral economics literature was a reliable guide.

Some subjects started the experiment with an endowment of $6. Then they could invest to have an equal chance of either doubling their money (earn $12) or getting $1. To maximize expected value, they should take that gamble. Most people would rather hold on to their endowment of $6 than risk experiencing a loss. It’s just $5. Why should the prospect of losing $5 blind them to the expected value calculation? Because most humans exhibit loss aversion.

I was relying on this pattern of behavior in stage 1 of the experiment for the test to be possible in stage 2. The main topic of the paper is whether people can predict what others will do. High endowment people fail to invest in stage 1, so then they predict that most other participants failed to invest. The high endowment people failed to incorporate easily available information about the other participants, which is that starting endowments {1,2,3,4,5,6} were randomly assigned and uniformly distributed. The effect size was large, even when I added in a quiz to test their knowledge that starting endowments are uniformly distributed.

Here’s a chart of my main results.

Investing always maximizes expected value, for everyone. The $1 endowment people think that only a quarter of the other participants fail to invest. The $6 endowment people predict that more than half of other participants fail to invest.

Does this help Mr. Hreha get Americans to buy more stuff at Walmart, for whom he consults? I’m not sure. Sorry.

My results do not directly imply that we need more government interventions or nudge units. One could argue instead that what we need is market competition to help people navigate a complex world. The information contained in prices helps us figure out what strangers want, so we don’t have to try to predict their behavior at all.

Here’s the end of my Conclusion

One way to interpret the results of this experiment is that putting yourself in someone else’s shoes is costly. We often speak of it as a moral obligation, especially to consider the plight of those who are worse off than ourselves. Not only do people usually decline to do this for moral reasons, they fail to do it for money. Additionally, this experiment shows that, if people are prompted to think about a specific past experience that someone else had, then mutual understanding is easier to establish.

I’m attempting to establish general purpose laws of behavior. I’ll end with a quote from Scott Alexander’s reply post.

A thoughtful doctor who tailors treatment to a particular patient sounds better (and is better) than one who says “Depression? Take this one all-purpose depression treatment which is the first thing I saw when I typed ‘depression’ into UpToDate”. But you still need medical journals. Having some idea of general-purpose laws is what gives the people making creative solutions something to build upon.

Behavioral Economics Conversation: Cutler and Glaeser

I haven’t written a formal response, yet, to the “behavioral economics is dead” claim going around Twitter. I’m too busy doing my referee reports on behavioral papers to write in depth about why behavioral is not dead. Incidentally, I’m not loving the most recent paper I was sent, so maybe that’s a point in the column of Team Death. I’ll write a few posts intersecting with the arguments being had.

First, I’ll point out two places in a CWT discussion of health and cities where the phrase “behavioral” was used. This is obviously a current conversation. David Cutler probably wouldn’t say that behavioral economics is his field, but here’s how he describes puzzles in decision making over health issues. (bold emphasis mine)

Everything that we know in healthcare is that people have difficulty choosing on the basis of price and quality. It goes back a little bit to some of the behavioral issues that we were talking about, but I think it’s slightly different. If you go to the doctor, and the doctor says you should take medication X, and you go to the pharmacy, and the pharmacy says that’ll be $30, a fair number of people will walk away and say, “I don’t have $30.”

What we would hope they would do is go to their doctor and say, “Doctor, is there any way that there could be a cheaper medicine that might work because $30 is hard for me this month?” In practice, people are extremely uncomfortable doing that. They really don’t like to go to their doctor and say, “Doctor, how do I trade off the money here versus the medicine?”

David Cutler

The previous issues Cutler mentioned had to do with time preference and delayed gratification. The turmoil over dieting alone is evidence that people don’t always make the best decisions.

Here’s the second of two appearances of the word “behavioral”, in response to Tyler’s question about how to make cities healthier.

I certainly join the crowd of economists who have argued that congestion pricing is the best way to deal with urban traffic jams. There’s no reason not to charge people for the social cost of their actions on that. And giving away street space for free is just crazy, especially since we now have technologies that can handle this.

And if we introduce autonomous vehicles without congestion pricing, you have just lowered the cost of sitting in traffic, which means the first-order behavioral response is that more people will sit in traffic, and our congestion will get even worse unless we introduce this from the beginning. So I think pricing is really good.

Ed Glaeser

In the second use of the word, it sounds like an individually-rational decision to sit in your autonomous vehicle and read blogs until your arrive at your destination. Maybe we can use mechanism design to reduce traffic congestion and improve life for all.

Whether or not you think behavioral economics is dead, economists are going to keep using the word “behavioral” for a long time.

I did a quick Ngram to get a sense of how common the word is, although this does not restrict the search to books about economics. Ngrams are easier to interpret if there is a comparison word. I choose the word “clustering” because it’s also a relatively new technical term. Both words were quite rare before 1930.

If you missed the small discussion about behavioral econ, Mike Munger did a link round-up here. Tomorrow’s post will be Vernon Smith’s view of behavioral economics.

Generous Health Insurance Makes Employees Stay

The idea of “job lock” is well established in the academic literature- employees leave firms that don’t offer health insurance more often than they leave firms that do. But this literature has always measured employer-provided health insurance as a simple binary: either they offer it or they don’t. In fact employers vary widely in the generosity of their plans, both in the quality of the insurance and in how much of the cost is paid by the employer. Some employers pay all of the premiums, some pay none, and most pay part:

Data are from the Current Population Survey, which uses top-coding to protect privacy (values greater than 9997 are reported as 9997)

In an article published last week in Applied Economics Letters, my colleague Michael Mathes and I combine two supplements of the Current Population Survey to test whether employers who contribute more towards health insurance see their employees stay longer. Perhaps not surprisingly, we find that they do. We run lots of regressions to establish this, but this simple fit plot tells the story best:

What we found more surprising was the magnitude of this effect: a thousand dollar increase in employer contributions to health insurance is associated with at least 83 additional days of job tenure, compared to less than 10 additional days for a thousand dollar increase in wages. We conclude that:

For employers trying to increase retention, increasing contributions to health insurance appears to lengthen employee tenure far more than increasing wages by a similar amount.

Why the difference? Probably employees rationally valuing $1000 in untaxed contributions to health insurance above $1000 in taxable wages. Why don’t employers shift more compensation away from wages and toward health insurance, given that employees seem to prefer it? Here I’m less sure, and they could simply be making a mistake, but one possibility is that they worry about increasing their costs as couples whose employers both offer insurance choose the more generous one for a family plan. Another is that while generous health insurance plans are better for retention, higher wages could be better for attracting new employees, who tend to be younger and for whom the salary number could be more salient.

Preferences for Equality and Efficiency

Most people would consider both equality and efficiency to be good. They are “goods” in the sense that more of them makes us happier.  However, in some situations, there is a trade-off between having more equality and getting more efficiency. Extreme income redistribution makes people less productive and therefore lowers overall economic output.

Examining the preferences people have for efficiency and equality is hard to do because the world is complicated. For example, a lot of baggage comes along with real world policy proposals to raise(lower) taxes to do more(less) income redistribution. A voter’s preference for a particular policy could be confounded by their personal feelings toward a particular politician who might have just had a personal scandal.

With Gavin Roberts, I ran an experiment to test whether people would rather get efficiency or equality (paper on SSRN). Something neat that we can do in a controlled lab setting is systematically vary the prices of the goods (see my earlier related post on why it’s neat to do this kind of thing in the lab).

One wants to immediately know, “Which is it? Do people want equality or efficiency?”. If forced to give a short answer, I would say that the evidence points to equality. But overly simplifying the answer is not helpful for making policy. The demand curve for equality slopes down. If the price of equality is too high, then people will not choose it. In our experiment, that price could be in terms of either own income or in group efficiency. We titled our paper “Other People’s Money” because more equality is purchased when the cost comes in terms of other players’ money.

The main task for subjects in our experiment is to choose either an unequal distribution of income between 3 players or to pick a more equal distribution. Given what I said above that people like equality, you might expect that everyone will choose the more equal distribution. However, choosing a more equal distribution comes at a cost. Either subjects will give up some of their own earnings from the experiment or they will lower the total group earnings. As is true in policy, some schemes to reduce inequality are higher cost than others. When the cost is low, we observe many subjects (about half) paying to get more equality. However, when the cost is high, very few subjects choose to buy equality.

This bar graph from our working paper shows some of the average behavior in the experiment, but it does not show the important results about price-sensitivity.

Continue reading

Results on stability and gift-exchange

Bejarano, Corgnet, and Gómez-Miñambres have a newly published paper on gift-exchange.

Abstract: We extend Akerlof’s (1982) gift-exchange model to the case in which reference wages respond to changes in economic conditions. Our model shows that these changes spur disagreements between workers and employers regarding the reference wage. These disagreements tend to weaken the gift-exchange relationship, thus reducing production levels and wages. We find support for these predictions in a controlled yet realistic workplace environment. Our work also sheds light on several stylized facts regarding employment relationships, such as the increased intensity of labor conflicts when economic conditions are unstable.

Next, I will provide some background on gift-exchange and experiments.

Continue reading

Editing: You Figure It Out

If you want to change how a field works, you have a few options. You can do what you want to see more of, but you are only one person, and perhaps not the one best equipped to make things better. Or you can encourage others to work differently- but why would they listen to you?

Academics often serve as peer reviewers for the work of others. If a reviewer recommends that a paper be rejected, it usually is; if you recommend specific minor changes they usually get made. But you can’t really tell people that they should work on a totally different topic. Journal editors for the most part simply have a scaled-up version of the powers of peer reviewers to steer the field. But unlike reviewers, their positions are public and fairly long-lasting. This means they can credibly say “this is the sort of work I’d like to see more of- if you do this kind of work, there’s a good chance I’ll publish it”.

This is part of why I’ve been hoping to be a journal editor some day, and why I’m excited to be guest-editing for the first time: a special issue on Health Economics and Insurance for the Journal of Risk and Financial Management. The description notes:

Continue reading