IPUMS Data Intensive Workshop & Conference

I just returned from the Full Count IPUMS data workshop at the Data-Intensive Research Conference that was hosted by the Network on Data Intensive Research on Aging and IPUMS. The theme of this conference was “Linking Records”.

It was the best workshop and conference that I’ve ever attended. I’d attended the conference remotely in the past. But attending the workshop was exceptional. Myself and about 20 other people were flown to the Minneapolis Population Center and put up in a hotel during our stay (that made the conference a low-stress affair). The whole workshop was well organized, the speakers built on one another’s content, and there was a hands-on lab for us to complete. I felt my human capital growing by the hour.  

Continue reading

Venezuelans Vote Overwhelmingly Against Maduro

Venezuela held an election this week; President Maduro says he won, while the opposition and independent observers say he lost. Disputed elections like this are fairly common across the world, but where Venezuela really stands out is not how people vote at the ballot box- it is how they vote with their feet.

Reuters notes that “A Maduro win could spur more migration from Venezuela, once the continent’s wealthiest country, which in recent years has seen a third of its population leave.”

I don’t think we emphasize enough how crazy the scale of this is. After every US Presidential election, you hear some people who supported the losing side talk about leaving the country, but they almost never do. Leaving your home country behind is a dramatic step, one people only want to take if they think things are much better elsewhere. The US, even with a party you don’t like in power, has generally stayed a good place to live. The total number of Americans who have moved abroad for any reason (I would guess most feel more pulled by the host country rather than pushed by the US) is about 3 million. That is less than 1% of all Americans; by contrast more than 46 million people have immigrated to the US from other countries, and many more would come if we allowed it.

Even in poor countries, seeing anything like one third of the population leave is dramatic, especially when almost all the migration happens in only 10 years as in Venezuela:

Source. Note this only goes through 2020, and emigration has grown since

This makes Venezuela the largest refugee crisis in the history of the Americas, and depending on how you count the partition of India, perhaps the largest refugee crisis in human history that was not triggered by an invasion or civil war.

Instead, it has been triggered by the Maduro regime choosing terrible policies that have needlessly and dramatically impoverished the country:

I hope that the Venezuelan government will soon come to represent the will of its people. I’m not sure how that is likely to happen, though I guess positive change is mostly likely to come from Venezuelans themselves (perhaps with help from Colombia and Brazil); when the US tries to play a bigger role we often make things worse. But what has happened in Venezuela for the past 10 years is clearly much worse than the “normal” bad economic policies and even democratic backsliding that we see elsewhere. People everywhere complain about election results and economic policy, but nowhere else have I seen such a case of people going past simple cheap talk, taking the very expensive step of voting against the regime with their feet.

Fiscal Illusion: It’s Real (People Underestimate How Much They Pay in Taxes)

The concept of “fiscal illusion” has long existed in public finance, but it is difficult to test. The basic theory is that people will underestimate how much they pay in taxes, as well as underestimate government expenditures. A forthcoming paper in Public Choice by Kaetana Numa uses survey data from the United Kingdom to test the theory, and finds support. From the abstract of “Fiscal illusion at the individual level“:

“providing personalized fiscal information reduces support for higher taxes and spending and increases support for lower taxes and spending. These findings indicate that taxpayers underestimate both their tax liabilities and the costs of public services.”

The paper uses a “novel personalized fiscal calculator” to estimate how much tax an individual would actually owe. It then randomizes which taxpayers get this information, and finds that “the treated respondents… were less supportive of raising taxes and more supportive of cutting taxes than the respondents in the control condition.”

And the results are large. For all taxes, in the treated group that saw their personalized fiscal calculator, 61 percent support cutting taxes, versus just 50 percent in the control group. The differences show up across the major taxes that individuals pay in the UK, including the income tax, national insurance contributions (both employer and employee sides), and the VAT. There is no tax category where the treatment group is more likely to want to increase the tax, though the VAT and the smaller Fuel duty and Council tax are about equal on the percent wanting an increase (but the median response for these last two is to decrease the tax — in both the control and treatment groups).

Do these results from the UK hold up in other developed nations? Possibly. In a 2014 Eurobarometer survey, the percent of EU citizens that could correctly identify their nation’s VAT rate varied widely. The high was 89 percent in Germany correctly identifying the rate, down to 31 percent in Ireland. The average was 65 percent — though the UK was at the low end with only about 47 percent correctly identifying the VAT rate.

Fiscal illusion appears to be a real issue, and probably an important one in the UK.

Tech Stocks Sag as Analysists Question How Much Money Firms Will Actually Make from AI

Tech stocks have been unstoppable for the past fifteen or so years. Here is a chart from Seeking Alpha for total return of the tech-heavy QQQ fund (orange line) over the past five years, compared to a value-oriented stock fund (VTV), a fund focused on dividend-paying stocks (SDY) and the Russel 2000 small cap fund IWM.

QQQ has left the others in the dust. There has been a reversal, however, in the past month. The tech stocks have sagged nearly 10% since July 11, while the left-for-dead small caps (IWM, green line) rose by 10%:

Some of this is just mean reversion, but there seems to be a deeper narrative shift going on. For the past 18 months, practically anything that could remotely be connected with AI, especially the Large Language Models (LLM) exemplified by ChatGPT, has been valued as though it would necessarily make every-growing gobs of money, for years to come.

In recent weeks, however, Wall Street analysts have started to question whether all that AI spending will pay off as expected. Here are some headlines and excerpts (some of the linked articles are behind paywalls):

““There are growing concerns that the return on investment from heavy AI spending is further out or not as lucrative as believed, and that is rippling through the whole semiconductor chain and all AI-related stocks,” said James Abate, chief investment officer at Centre Asset Management.”

www.bloomberg.com/…

““The overarching concern is, where is the ROI on all the AI infrastructure spending?” said Alec Young, chief investment strategist at Mapsignals. “There’s a pretty insane amount of money being spent.
Jim Covello, the head of equity research at Goldman Sachs Group Inc., is among a growing number of market professionals who are arguing that the commercial hopes for AI are overblown and questioning the vast expense required to build out infrastructure required for the computing to run and train large-language models.”

www.bloomberg.com/…

“It really feels like we are moving from a ‘tell me’ story on AI to a ‘show me’ story,” said Ohsung Kwon, equity and quantitative strategist at Bank of America Corp. “We are basically at a point where we’re not seeing much evidence of AI monetization yet.”

https://finance.yahoo.com/news/earnings-derail-stock-rally-over-130001940.html

Goldman’s Top Stock Analyst Is Waiting for AI Bubble to Burst

Covello casts doubt on hype behind an $16 trillion rally

He says costs, limited uses means it won’t revolutionize world

https://finance.yahoo.com/news/goldman-top-stock-analyst-waiting-111500948.html

Google stock got dinged last week for excessive capital spending, even though earnings were strong. Microsoft reports its Q4 earnings after the market closes today (Tuesday); we will see how investors parse these results.

On field experiments and null effects

The most interesting new paper in months dropped in last week: “Does Income Affect Health? Evidence from a Randomized Control Trial of a Guaranteed Income

All the broad strokes are there in the abstract: $1000 per month, 2,000 participants (half treated), for three years. It’s the biggest, highest salience experimental test of a universal basic income program to date. There’s a lot of detail, but the broad strokes findings are that nothing happened. That is to say, there was a lot of precise null effects. And that is absolutely fascinating! I’ve gone back in forth on my feelings about a large scale UBI policy, and this is certainly more evidence that gives me pause, but my biggest takeaway is that policy research really should culimate in a series of field experiments whenever possible. Not because of identification or external validity or any of the other reasons economists fight in intellectual perpetuity, but because it’s easier to accept null results as sufficiently precise. It’s easier to acknowledge and accept that there is no observed effect because the treatment mechanism truly had no net effect within an experimental design.

Conducting research using observational data to produce causally identified conclusions is to fight a battle on multiple fronts. These fronts usually relate to the all the possible sources of bias, of endogeneity, within your analysis. You’re observing x causing y to increase, but the reality is that x is correlated with z, and that is what is actually causing y to increase. That’s a tough fight, believe me, as hypothetical sources of varying degrees of plausibility are hurled at your analysis from all directions. But at least there is an argument to be had. There’s something to fight against and over.

Null results face a far more insidious argument: there’s just too much noise in your analysis. Too many sources of variation, to much measurement error, too much something (that I don’t have to bother unpacking) and that’s why your standard errors are too big to identify the true underlying effect. There’s also a simple, and annoying, institutional reality: there’s no t-test for a precise zero. There’s no p value, so threshold for statistical significance that says this is a “true zero”. All we can say is that the results fail to reject the null. It’s subjective. And in a world of 2 percent acceptance rates at top journals, good luck getting through a review process where the validity of statistical interpretaton is assessed in a purely subjective manner.

Field experiments enjoy far more grace with null results. As random control trials, they can argue that their null effects are, in fact, causally estimated. If conducted within sufficient power (i.e. number of observations relative to feasible impact), then the results are simply the results. There’s no arguments to be had about instrumental variables, regression discontinuity cutoffs, or synthetic control designs. Measurement error will rarely be a problem given an appropriate design. External validity…well there’s no getting around external validity gripes, but should those concerns appear then the opposition has already accepted the statistical validity of your null results. You’ve already won.

I’m not puffing up my own team here, either. I’ve conducted several lab experiments, but never a field experiment. They are large, lengthy, and costly endeavors. I aspire to run a couple before my career runs its course, but I’ve built nothing on them to date. But they’ve grown in my estimation, even admidst concerns over participant gaming and external validity, precisely because you can run your experiment, observe no measurable impact on anything, and proclaim in earnest that that is precisely what happened. Nothing.

Sources on AI use of Information

  1. Consent in Crisis: The Rapid Decline of the AI Data Commons

Abstract: General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, Refined Web, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how consent preferences to use it are changing over time. We observe a proliferation of AI specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites’ expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crisis in data consent, foreclosing much of the open web, not only for commercial AI, but non-commercial AI and academic purposes.

AI is taking out of a commons information that was provisioned under a different set of rules and technology. See discussion on Y Combinator 

2. “ChatGPT-maker braces for fight with New York Times and authors on ‘fair use’ of copyrighted works” (AP, January ’24)

3. Partly handy as a collection of references: “HOW GENERATIVE AI TURNS COPYRIGHT UPSIDE DOWN” by a law professor. “While courts are litigating many copyright issues involving generative AI, from who owns AI-generated works to the fair use of training to infringement by AI outputs, the most fundamental changes generative AI will bring to copyright law don’t fit in any of those categories…” 

4. New gated NBER paper by Josh Gans “examines this issue from an economics perspective”

Joy: AI companies have money. Could we be headed toward a world where OpenAI has some paid writers on staff? Replenishing the commons is relatively cheap if done strategically, in relation to the money being raised for AI companies. Jeff Bezos bought the Washington Post. It cost a fraction of his tech fortune (about $250 million). Elon Musk bought Twitter. Sam Altman is rich enough to help keep the NYT churning out articles. Because there are several competing commercial models, however, the owners of LLM products face a commons problem. If Altman pays the NYT to keep operating, then Anthropic gets the benefit, too. Arguably, good writing is already under-provisioned, even aside from LLMs.

You, Parent, Should have a Robot Vacuum

Do you have a robot vacuum? The first model was introduced in 2002 for $199. I don’t know how good that first model was, but I remember seeing plenty of ads for them by 2010 or so. My family was the cost-cutting kind of family that didn’t buy such things. I wondered how well they actually performed ‘in real life’. Given that they were on the shelves for $400-$1,200 dollars, I had the impression that there was a lot of quality difference among them. I didn’t need one, given that I rented or had a small floor area to clean, and I sure didn’t want to spend money on one that didn’t actually clean the floors. I lacked domain-specific knowledge. So I didn’t bother with them.

Fast forward to 2024: I’ve got four kids, a larger floor area, and less time. My wife and I agreed early in our marriage that we would be a ‘no shoes in the house’ kind of family.  That said, we have different views when it comes to floor cleanliness. Mine is: if the floors are dirty, then let’s wait until the source of crumbs is gone, and then clean them when they will remain clean. In practice, this means sweeping or vacuuming after the kids go to bed, and then steam mopping (we have tile) after parties (not before). My wife, in contrast, feels the crumbs on her feet now and wants it to stop ASAP. Not to mention that it makes her stressed about non-floor clutter or chaos too.

Continue reading

When Beer is Safer than Water

I’ve often heard that before modern water treatment, it was safer to drink beer; but I’ve also heard people call this a historical myth. A new paper in the Journal of Development Economics by Francisca Antman and James Flynn comes down strongly on the side of “beer really was safer”:

This paper provides the first quantitative estimates into another well-known water alternative during the Industrial Revolution in England.

Although beer in the present day is regarded as being worse for health than water, several features of both beer and water available during this historical period suggest the opposite was likely to be true. First, brewing beer requires boiling the water, which kills many dangerous pathogens often found in drinking water. As Bamforth (2004) puts it, “the boiling and the hopping were inadvertently water purification techniques”. Second, alcohol itself has antiseptic qualities. Homan (2004) notes that “because the alcohol killed many detrimental microorganisms, it was safer to drink than water” in the ancient near-east.

They use several identification strategies to establish this, for instance when a tax on malt was increased and mortality went up:

But did this mean people were drunk all the time? Probably not:

beer in this period was generally much weaker than it is today, and thus would have been closer to purified water. Accum (1820) found that common beers in late 18th and early 19th century England averaged just 0.75% alcohol by volume, a fraction of the content of the beers of today. Beer in this period was therefore far less harmful to the liver. Taken together, these facts suggest that beer had many of the benefits of purified water with fewer of the health risks associated with beer consumption today.

In fact, people at the time didn’t necessarily know that beer was healthier:

Thus, even though people did not recognize beer as a safer choice, drinking beer would have been an unintentional improvement over water, and thus may have contributed to improvements in human health and economic development over the period we investigate

Though as usual, Adam Smith was ahead of his time. Here’s what he had to say in his 1776 Wealth of Nations, in a chapter on malt taxes:

Spirituous liquors might remain as dear as ever, while at the same time the wholesome and invigorating liquors of beer and ale might be considerably reduced in their price.

Inflation in the G7 and Russia

Among the former G8 countries, Russia has by far the highest cumulative inflation rate since January 2020, almost double the amount of inflation we’ve seen in the US and in most G7 countries. No doubt the effects of the wartime economy are contributing to this, but even in February 2022 before they invaded Ukraine, their inflation still had clearly been worse.

The US is on the high end for this group, but pretty close to the median. Japan looks really good on inflation, but that’s probably not much comfort to them since their economy is still smaller than before the pandemic. By this measure, the US looks pretty good (chart from Joey Politano):

GDP estimates for Russia are a little tricky because of the war, but according to IMF estimates, Russia’s economy in 2023 was about 5.6% larger than 2019 in real terms.

See also: Food Inflation in the G7 and Russia

How to (Almost) Double Your Investing Returns 3. “Stacked” Multi-Asset Funds

Two weeks ago we described a simple way to achieve roughly double investing returns on some asset class like an S&P 500 stock basket, or on some commodity like gold or oil, by buying shares in an exchange-traded fund (ETF) whose price moves up or down each day two times as much as the price of the underlying stocks or commodities. For instance, if the S&P 500 stocks go up (or down) by 2% on a given day, the price of the SSO ETF will move up (or down) by 4%.  And last week we noted that buying deep in the money call options can also result in an investment which can move up or down by twice the percentage of the underlying stock. These call options side-step the volatility drag implicit in the 2X funds, but require some housekeeping on the investors part to roll them over once or twice a year.

Today we present a third approach for multiplying the return on your investment dollars. This is to buy shares of a fund which holds two different asset classes, in a leveraged form. As an example: if you buy $100 worth of the fund PSLDX, you are buying the equivalent of $100 worth of S&P 500 stocks PLUS about $100 worth of long-dated US Treasury bonds. (PSLDX happens to be an old-fashioned mutual fund, not an ETF, but no matter). It works like this: The fund takes your $100 and buys a bucket of bonds. It then uses those bonds as collateral, and uses futures to get around $100 worth of exposure to the price movements of the S&P 500 stocks. There is not quite a free lunch here, since there is a “carry” cost on the futures, which is about equal to the LIBOR/SOFR short term interest rates (currently ~ 5%).

PSLDX does not promise exactly 100/100  stock/bond exposure, but it comes out pretty close much of the time. A similar product is NTSX which is leveraged x1.5. It gives 90/60 stocks/mixed-term bonds. NTSX has outperformed PLSDX in recent years, since the price of long-term (10-20 year) bonds has been crushed due to the rise in interest rates. RSSB is a recent entry into this space, offering 100/100 exposure to global stocks/laddered Treasuries.

Another reason these leveraged stock/bond products have done relatively poorly in the past two years is that the cost of leverage is actually higher than the bond coupons, due to the inverted yield curve.  This problem will go away if the Fed lowers short-term rates back down to near zero, as they were prior to 2022, but lingering inflation makes that prospect unlikely.

That said, if I have $200 to invest and want $100 stock and $100 bond coverage, I can put $100 into one of these 100/100 funds, and still have $100 left to collect interest on or to invest in some other, hopefully higher-yielding venue. So, these stock/bond funds have their place.

Where this so-called asset stacking shines even more is combining stocks or bonds with something like managed futures. Managed futures are an excellent diversifier for equities (see here). Moreover, since managed futures are typically held in both long and short positions, there will be less financing (carry) cost associated with them. When both stocks and bonds cratered in 2022, managed futures went up. Thus, funds like BLNDX (50 global stocks/100 managed futures) and MAFIX (stocks plus managed futures) went up in 2022, and then continued to rise as stocks recovered. Thus, the returns for these two funds have been steadier and higher than plain stocks (SP 500) over the past three years:

Total returns for past three years, for BLNDX (50 stocks/100 managed futures), SP500 stocks, BND broad US bonds, and MAFIX stacked multi-asset.

BLNDX and its sister fund REMIX are readily available at most brokerages (I hold some), while MAFIX may have daunting minimum investment requirements. RSST is a recent 100/100 stock/managed futures ETF that is easily invested in, and seems to be performing well.

Disclaimer: As usual, nothing here should be considered advice to buy or sell any investment.