Replicating Research with Restricted Data

If a scientific finding is really true and important, we should be able to reproduce it- different researchers can investigate and confirm it, rather than just taking one researcher at their word.

Economics has not traditionally been very good at this, but we’re moving in the right direction. It is becoming increasingly common for researchers to voluntarily post their data and code, as well as for journals (like the AEA journals) to require them to:

Source: This talk by Tim Errington

This has certainly been the trend with my own research; if you look at my first 10 papers (all published prior to 2018) I don’t currently share data for any of them, though I hope to go back and add it some day. But of my most recent 10 empirical papers, half share data.

This sharing allows other researchers to easily go back and check that the work is accurate. This could mean simply checking that it is “reproducable”, i.e., that running the original code on the original data produces the results that the authors said. Or it could mean the more ambitious “replicability”, i.e., you could tackle the same question with different data and still find basically the same answer. Economics does generally does well at reproducability when code is shared, but just ok at replication.

Of course, even when data and code are shared, you still need people to actually do the double-checking research; this is still relatively rare because it is harder to publish replications than original research. But more replication journals are opening, and there are now several projects funding replications. The trends are all in the right direction to establish real, robust findings, with one exception- the rise of restricted data.

Traditionally most economics research has been done using publicly available datasets like the Current Population Survey. But an increasing proportion, perhaps a majority of research at top journals, is now done using restricted datasets (there’s a great graph on this I can’t find but see section 3.3 here). These datasets legally can’t be shared publicly, either due to privacy concerns,licensing agreements, or both. But journals almost always still publish these articles and give them an exemption to the data sharing requirement. One the one hand it makes sense not to ignore this potentially valuable research when there are solid legal reasons the data can’t be shared. But it does mean we can’t be as confident that the data has been analyzed correctly, or that it even really exists.

One potential solution is to find people who have access to the same restricted dataset and have them do a replication study. This is what the Institute for Replication just started doing. They posted a list of 100+ papers that use restricted data that they would like to replicate. They are offering $5000 for replications of most of the papers, so I think it is worthwhile for academics to look and see if you already have access to relevant datasets, or if you study similar enough things that it is worth jumping through the hoops to get data access.

For everyone else, this is just one more reason not put too much trust in any one paper you read now, but to recognize that the field as a whole is getting better and more trustworthy over time. We will be more likely to catch the mistakes, purge the frauds, and put forward more robust results that at least bear a passing resemblance to what science can and should be.

Wives Slightly Out-earning Husbands Is No Longer Weird

As we have gone through our education and training and changed jobs, my wife and I have been in every sort of relative income situation, with each one sometimes vastly or slightly out-earning the other. Currently she slightly out-earns me, which I thought was unusual, as I remembered this graph from Bertrand, Kamenica and Pan in the QJE 2015:

Ungated source: Bertrand Pan Kamenica 2013

The paper argues that the big jump down at 50% is driven by gender norms:

this pattern is best explained by gender identity norms, which induce an aversion to a situation where the wife earns more than her husband. We present evidence that this aversion also impacts marriage formation, the wife’s labor force participation, the wife’s income conditional on working, marriage satisfaction, likelihood of divorce, and the division of home production. Within marriage markets, when a randomly chosen woman becomes more likely to earn more than a randomly chosen man, marriage rates decline. In couples where the wife’s potential income is likely to exceed the husband’s, the wife is less likely to be in the labor force and earns less than her potential if she does work. In couples where the wife earns more than the husband, the wife spends more time on household chores; moreover, those couples are less satisfied with their marriage and are more likely to divorce.

But when I went to look up the paper to show my wife the figures, I found that the effect it highlights may no longer be so large.  Natalia Zinovyeva and Maryna Tverdostup show in their 2021 AEJ paper that the jump down in wives’ income at 50% is quite small, and is largely driven by couples who have the same industry and occupation:

They created the figure above using SIPP/SSA/IRS Completed Gold Standard Files, 1990–2004. I’d be interested in an analysis with more recent data. Much of their paper uses more detailed Finnish data to test the mechanism for the remaining jump down at 50%. They conclude that gender norms are not a major driver of the discontinuity:

We argue that the discontinuity to the right of 0.5 can emerge if some couples tend toward earnings equalization or convergence. To test this hypothesis, we exploit the rich employer-employee–linked data from Finland. We find overwhelming support in favor of the idea that the discontinuity is caused by earnings equalization in self-employed couples and earnings convergence among spouses working together. We show that the discontinuity is not generated by selective couple formation or separation and it arises only among self-employed and coworking couples, who account for 15 percent of the population.

Self-employed couples are responsible for most observations with spouses reporting identical earnings. When couples start being self-employed, both sides of the distribution tend to equalize earnings, perhaps because earnings equalization helps couples to reduce income tax payments, facilitate accounting, or avoid unnecessary within-family negotiations. Large spikes emerge not only at 0.5 but also at other round shares signaling the prevalence of ad hoc rules for entrepreneurial income sharing in couples. Self-employment is associated with a fall of household earnings below the level predicted by individuals’ predetermined characteristics, but this drop is mainly due to a decrease in male earnings, with women being relatively better off.

In the case of couples who work together in the same firm, there is a compression of the earnings distribution toward 0.5 both on the right and on the left of 0.5. As a result, there is an increase both in the share of couples where men slightly outearn their wives and in the share of couples where women slightly outearn their husbands. Since the former group is larger, earnings compression leads to a detection of a discontinuity.

So, concerns about relative earnings aren’t causing trouble for women in the labor market. But do they cause trouble at home? Perhaps yes, but if so its not in a gendered way and not driven by the 50% threshold:

Separation rates do not exhibit any discontinuity around the 0.5 threshold of relative earnings. Instead, the relationship between the probability of separation and the relative earnings distribution exhibits a U-shape, with higher separation rates among couples with large earnings differentials either in favor of the husband or in favor of the wife.

You Cannot Cut Nominal Wages: Weavers in 1738

I’m reading The Fabric of Civilization (see my AdamSmithWorks on specialization). This is a fascinating story about cloth and markets:

In November 1738, clothier Henry Coulthurst informed weavers that he was cutting their piecework rates and would henceforth pay them in goods rather than cash. Needless to say, they were upset. Food prices were rising, and lower wages meant hunger and want.

Over three days in December, the weavers rioted. They smashed Coulthurst’s mill, wrecked his home, and “drank, carried out, and spilt, all the Beer, Wine and Brandy in the cellars.” They returned the following day to demolish Coulthurst’s house…

Wow. Our paper on cutting nominal wages is called “If Wages Fell During a Recession” We ran an experiment in which workers could retaliate if they experienced a nominal wage cut. They did! They couldn’t smash their employer’s house, but some of the slighted workers dropped their effort level down to the minimum level which meant that their employer made no more money in the experiment.

In my talk at IUE (show notes here and YouTube video), I connect the wage cut paper to another experiment on beliefs. One wonders, considering how serious the consequences turned out to be for Henry Coulthurst, why he was not able to anticipate the backlash against wage cuts. Being wrong was costly for him.

People are not always good at appreciating how strongly others have become attached to their own reference points. That’s why the paper on beliefs is called “My Reference Point, Not Yours

Historical Price to Earnings Ratios By Industry

Getting long-run historical PE ratios of US stocks by industry seems like the kind of thing that should be easy, but is not. At least, I searched for an hour on Google, ChatGPT, and Bing AI to no avail.

I eventually got monthly median PEs for the Fama French 49 industries back to 1970 from a proprietary database. I share two key stats here: the average of median monthly industry PE 1970-2022, and the most recent data point from late 2022.

IndustryLong Run MeanEnd 2022
AERO12.1419.49
AGRIC10.759.64
AUTOS9.6517.52
BANKS10.3810.46
BEER15.2335.70
BLDMT12.0015.41
BOOKS12.9517.60
BOXES12.1810.69
BUSSV12.0713.03
CHEMS12.4019.26
CHIPS10.4817.47
CLTHS11.4510.94
CNSTR8.984.58
COAL8.042.92
DRUGS1.148.01
ELCEQ10.7817.85
FABPR10.2819.40
FIN11.1612.97
FOOD14.3025.03
FUN9.1021.06
GOLD3.18-5.95
GUNS11.505.05
HARDW7.9619.16
HLTH11.916.09
HSHLD12.6020.15
INSUR10.9516.33
LABEQ13.4625.18
MACH12.5120.27
MEALS13.8319.19
MEDEQ6.8127.64
MINES8.0616.27
OIL6.969.00
OTHER12.2027.68
PAPER12.5016.69
PERSV12.86-0.65
RLEST8.13-0.30
RTAIL12.268.58
RUBBR12.1112.81
SHIPS9.7917.42
SMOKE11.7417.79
SODA12.3832.09
SOFTW8.21-2.85
STEEL8.184.30
TELCM6.759.58
TOYS9.18-1.32
TRANS11.2513.11
TXTLS9.43-49.00
UTIL12.3417.41
WHLSL11.0813.13
Mean Industry Median10.5212.73

One obvious idea for what to do with this is to invest in industries that are well below their historical price, and avoid industries that are above it (not investment advice). Looking just at current PEs is ok, but a stock with a PE of 8 isn’t necessarily a good value if its in an industry that typically has PEs of 6.

By this metric, what looks overvalued? Money-losing industries (negative current earnings): Gold, Personal Services, Real Estate, Software, Toys, and Textiles. Making money but valuations 19+ above historical average: Medical Equipment, Beer, Soda. Most undervalued relative to history: Guns, Health, Coal, Construction, Steel, Retail (all 3+ below the historical average).

Of course, I don’t recommend blindly investing in these “undervalued” industries- not just for legal reasons, but because sometimes the market prices them low for a reason- that earnings are expected to fall. The industry may be in secular decline due to new types of competition (coal, steel, retail). Or investors may expect it to get hit with a big cyclical decline in an upcoming recession or rotation from the Covid goods/manufacturing economy back to services (guns, construction, steel, retail). Health services (as opposed to drugs and medical equipment) stands out here as the sector where I don’t see what is driving it to trade at barely half of its usual PE.

I’d still like to get data on long run market-cap weighted mean PE by industry, as opposed to the medians I show here. The best public page I found is Aswath Damodaran’s data page, which has a wide variety of statistics back to about 1999. Some of the current PEs he calculates are quite different from those in my source, another reason to tread carefully here. I’m not sure how much of this is mean vs median and how much is driven by different classification of which stocks fit in which industry category.

This gets at a big question for anyone trying to actually trade on this- do you buy single stocks, or industry ETFs? Industry ETFs make sense in principle (since we’re talking about industry level PEs overall) and also add built-in diversification. But the PE for the ETF’s basket of stocks likely differs from that of the industry as a whole. It would make more sense to compare the ETF’s current PE to its own historical PE, but most industry ETFs have very short track records (nothing close to the 53 years I show here). PE is also far from the only valuation metric worth considering.

All this gets complex fast but I hope the historical PE ratio by industry makes for a helpful start.

New Paper with Evidence that ChatGPT Hallucinates Nonexistent Citations

I posted a new working paper with systematic evidence for false citations when ChatGPT (GPT-3.5) writes about academic literature.

Buchanan, Joy and Shapoval, Olga, GPT-3.5 Hallucinates Nonexistent Citations: Evidence from Economics (June 3, 2023). Available at SSRN: https://ssrn.com/abstract=4467968 or http://dx.doi.org/10.2139/ssrn.4467968

Abstract: We create a set of prompts from every Journal of Economic Literature (JEL) topic to test the ability of a GPT-3.5 large language model (LLM) to write about economic concepts. For general summaries, ChatGPT can perform well. However, more than 30% of the citations suggested by ChatGPT do not exist. Furthermore, we demonstrate that the ability of the LLM to deliver accurate information declines as the question becomes more specific. This paper provides evidence that, although GPT has become a useful input to research production, fact-checking the output remains important.

Figure 2 in the paper shows the trend that the proportion of real citations goes down as the prompt becomes more specific. This idea has been noticed by other people, but I don’t think it has been documented quantitatively before.

We asked ChatGPT to cover a wide range of topics within economics. For every JEL category, we constructed three prompts with increasing specificity.

Level 1: The first prompt, using A here as an example, was “Please provide a summary of work in JEL category A, in less than 10 sentences, and include citations from published papers.”

Level 2: The second prompt was about a topic within the JEL category that was well-known. An example for JEL category Q is, “In less than 10 sentences, summarize the work related to the Technological Change in developing countries in economics, and include citations from published papers.”

Level 3: We used the word “explain” instead of “summarize” in the prompt, asking about a more specific topic related to the JEL category. For L we asked, “In less than 10 sentences, explain the change in the car industry with the rising supply of electric vehicles and include citations from published papers as a list. include author, year in parentheses, and journal for the citations.”

The paper is only 5 pages long, but we include over 30 pages in the appendix of the GPT responses to our prompts. If you are an economist who has not yet played with ChatGPT, then you might find it useful to scan this appendix and get a sense of what GPT “knows” about varies fields of economics.

If SSRN isn’t working for you, here is Also a Google Drive link to the working paper: https://drive.google.com/file/d/1Ly23RMBlim58a7CbmLwNL_odHSNRjC1L/view?usp=sharing

Previous iterations of this idea on EWED:

https://economistwritingeveryday.com/2023/04/17/chatgpt-as-intern/ Mike’s thoughts on what the critter is good for.

https://economistwritingeveryday.com/2023/01/21/chatgpt-cites-economics-papers-that-do-not-exist/  This is one of our top posts for traffic in 2023, since this is a topic of interest to the public.  That was January of 2023 and here we are in June today. It’s very possible that this problem will be fixed soon. We can log this bug now to serve as a benchmark of progress.

A check in and comparison with Bing:

Video of Joy Buchanan on Tech Jobs and Who Will Program

Here are some show notes for a keynote lecture to a general audience in Indiana. This was recorded in April 2023.

Minute Topic
2:00“SMET” vs STEM Education – Does Messaging Matter?  
(Previous blog post on SMET)
5:00Is Computer Programming a “Dirty Job”? Air conditioning, compensating differentials, and the nap pods of Silicon Valley  
(post on the 1958 BLS report)
7:50Wages and employment outlook for computer occupations
10:00Presenting my experimental research paper “Willingness to be Paid: Who Trains for Tech Jobs?” in 23 minutes  

Motivation and Background 10:00 – 15:30
Experimental Design         15:30 – 22:00
Results                    22:00 – 30:00
Discussion                 30:00 – 33:30
33:50Drawbacks to tech jobs  

See also my policy paper published by the CGO on tech jobs and employee satisfaction
35:30The 2022 wave of layoffs in Big Tech and vibing TikTok Product Managers  

I borrowed a graph on Tech-cession from Joey Politano and a blog point from Matt Yglesias, and of course reference the BLS.
39:00Should You Learn to Code? (and the new implications of ChatGPT)  

Ethan Mollick brought this Nature article to my attention. 
Tweet credits to @karpathy and @emollick
48:00Q&A with audience

Video: Joy Presents Two Experimental Papers to a Macro Class

Here are some show notes to a talk I gave in April 2023. I had the opportunity to talk to an undergraduate macroeconomics class at Indiana University East.

Minute Topic
2:00Research on Behavioral economics and Macroeconomics
4:25Labor Market Equilibrium Concepts and Incomplete Labor Contracts
6:50The Gift Exchange Game and the Fair Wage-Effort Theory
13:00Recessions and Downward Wage Rigidity
19:00Presenting my Experimental Study “If Wages Fell During a Recession” in 13 minutes
32:00-33:00How question raised in “If Wages Fell During a Recession” pointed the way to the Reference Point paper
33:00 – 41:00Presenting my Experimental Study “My Reference Point, Not Yours” in 8 minutes
41:00-44:00Conclusion of “My Reference Point, Not Yours” and tying it back to macroeconomics

The “If Wages Fell…” paper directly inspired the “My Reference…” experiment. But I don’t cite “If Wages Fell…” in “My Reference…,” so you would never know how closely they are connected unless you listen to this talk.

Health Insurance and Wages: Compensating Differentials in Reverse?

One of the oldest theories in economics is the idea of compensating differentials. A job represents not just a certain amount of money per hour, but a whole package of positive and negative things. Jobs have more or less stability, flexibility, fun, room to grow, danger… and non-cash benefits like health insurance. The idea of compensating differentials is that, all else equal, jobs that are good on these other margins can pay lower cash wages and still attract workers (thus, the danger of doing what you love). On the other hand, jobs that are bad on these other margins need high wages if they want to hire anyone (thus, the deadliest catch)

I think this theory makes perfect sense, and we see evidence for it in many places. But when it comes to health insurance, everything looks backwards. A job that offers employer-provided health insurance is better to most employees than one that doesn’t, so by compensating differentials it should be able to offer lower wages. There’s just one problem: US data shows that jobs offering health insurance also offer significantly higher wages. The 2018 Current Population Survey shows that workers with employer-provided health insurance had average wages of $33/hr, compared to $24/hr for those without employer insurance.

All the economists are thinking now: that’s not a problem, compensating differentials is an “all else equal” claim, but not all else is equal here. The jobs with health insurance pay higher wages because they are trying to attract higher-skilled workers than the jobs that don’t offer insurance.

That’s what I thought too. It is true that jobs with insurance hire quite different workers on average:

Source: 2017 CPS analyzed here

The problem is, once we control for all the observable ways that insured workers differ, we still find that their wages are significantly higher than workers who don’t get employer-provided insurance. Like, 10-20% higher. That’s after controlling for: year, sex, education, age, race, marital status, state of residence, health, union membership, firm size, whether the firm offers a pension, whether the employee is paid hourly, and usual hours worked. I’ve thrown in every possibly-relevant control variable I can think of and employer-provided health insurance always still predicts significantly higher wages. Of course, there are limits to what we get to observe about people using surveys; I don’t get any direct measures of worker productivity. Possibly the workers who get insurance are more skilled in ways I don’t observe.

We can try to account for these unobserved differences by following the same person from one job to another. When someone switches jobs, they could have health insurance in both jobs, neither, only the new, or only the old. What happens to the wages of people in each of these situations? It turns out that gaining health insurance in a new job on average brings the biggest increase in wages:

What could be going on here? One possibility is that health insurance makes people healthier, which improves their productivity, which improves their wages. But we control for health status and still find this effect. The real mystery is that papers that study mandatory expansions of health insurance (like the ACA employer mandate and prior state-level mandates) tend to find that they lower wages. Why would employer-provided health insurance lower wages when it is broadly mandated, but raise wages for individuals who choose to switch to a job that offers it?

My current theory is that “efficiency benefits” are offered alongside “efficiency wages”. The idea of efficiency wages is that some firms pay above-market wages as a way of reducing turnover. Workers won’t want to leave if they know their current job pays above-market, and so the company saves money on hiring and training. But this only works if other firms aren’t doing it. The positive correlation of wages and insurance could be because the same firms that pay “efficiency wages” are more likely to pay “efficiency benefits”- offering unusually good benefits as a way to hold on to employees.

I still feel like these results are puzzling and that I haven’t fully solved the puzzle. This post summarizes a currently-unpublished paper that Anna Chorniy and I have been working on for a long time and that I’ll be presenting at WVU tomorrow. We welcome comments that could help solve this puzzle either on the empirical side (“just control for X”) or the theoretical side (“compensating differentials are being overwhelmed here by X”).

More Ideas Pages

I’ve written here about my ideas page of economics papers I’d like to see.

After that post I heard from others who maintain similar pages. David Friedman has a small page here with research ideas, along with larger pages of short story ideas and product ideas.

HiveReview is a site where one can post or comment on both completed papers and paper ideas. The site does many things at once, but one use case is to post ideas in search of collaborators or to search for projects where someone wants a collaborator for their idea.

I learned today that Gwern Branwen maintains a large page of “Questions“, some of which could be research ideas, mostly outside of economics. He also has pages of research ideas and startup ideas. Some examples of Questions:

Given the crucial role of trust and shared interests in success stories like Xerox PARC or the Apollo Project or creative collaborations in general, why are there so few extremely successful pairs of identical twins?

Nicotine alternatives or analogues: there seem to be none, but why not?

Nicotine is one of the best stimulants on the market: legal, cheap, effective, relatively safe, with a half-life less than 6 hours. It also affects one of the most important and well-studied receptors. Why are there no attempts to develop analogues or replacements for nicotine which improve on it eg. by making it somewhat longer-lasting or less blood-pressure-raising, when there are so many variants on other stimulants like amphetamines or modafinil or caffeine?

Why States Hate Nursing Homes

Medicaid is a health insurance program for those with low incomes, funded largely by states. Overall it accounts for less than 20% of US medical spending. But there is one area where it is the dominant payer: nursing homes. Nursing homes are expensive, and Medicare (the typical insurance for those over 65) won’t cover them after the first hundred days, so most nursing home residents end up paying out of pocket until they burn through all their savings and wind up on Medicaid. At which point, Medicaid pays about $100,000 per year to the nursing home for the rest of their life.

States are responsible for up to half of that cost, and so start looking for ways to save money. One idea they have is to make it harder to build nursing homes: if there aren’t beds available, potential nursing home patients will have to stay home instead, where they can’t rack up Medicaid spending the same way. In fact, some states go all the way to a complete moratorium on new nursing homes:

Source: Institute for Justice

Some other states allow new nursing homes, but only with a special permission slip called a Certificate of Need (CON). CON is often required for other types of health facilities as well, like hospitals or dialysis centers. Research by me and others has generally found that CON doesn’t work as a way to reduce spending, and in fact actually increases it. CON might reduce the number of facilities, but that reduction of supply and competition gives the remaining facilities more power to raise prices.

So which effect dominates- does the smaller number of facilities reduce total spending, or do the higher prices increase it? It depends on the elasticity of demand:

In health care demand is typically quite inelastic, so the price effect dominates, and spending goes up:

But nursing homes could be an exception here. Elasticity of demand could be relatively high because of the number of potential substitutes- home care or assisted living for those with relatively low medical needs, hospitals for those with relatively high medical needs. Plus this is the one type of health care where Medicaid is the dominant payer. They could be especially resistant to price increases here, both due to their market power and their willingness to keep prices so low that facilities won’t take Medicaid patients (another way to save money!).

A new paper by Vitor Melo and Elijah Neilson finds that this is indeed the case. Indiana, Pennsylvania, and North Dakota repealed their nursing home CON requirements in the ’90s, and at least for IN and PA their Medicaid spending went way up. The paper uses a new “synthetic difference in difference” technique that seems appropriate, and creates figures that seem confusing at first but get a ton of information across:

They correctly note that they don’t evaluate the welfare effects of the policy; it’s possible that the extra nursing home beds following CON repeal bring huge benefits to seniors that are worth the higher spending. But nursing homes could be the exception to the general rule that CON fails to achieve the goals, like reduced spending, that advocates set for it.