Many Impressive AI Demos Were Fakes

I recently ran across an article on the Seeking Alpha investing site with the provocative title “ AI: Fakes, False Promises And Frauds “, published by LRT Capital Management. Obviously, they think the new generative AI is being oversold. They cite a number of examples where demos of artificial general intelligence were apparently staged or faked.  I followed up on a few of these examples, and it does seem like this article is accurate. I will quote some excerpts here to give the flavor of their remarks.

In 2023, Google found itself facing significant pressure to develop an impressive innovation in the AI race. In response, they released Google Gemini, their answer to OpenAI’s ChatGPT. The unveiling of Gemini in December 2023 was met with a video showcasing its capabilities, particularly impressive in its ability to handle interactions across multiple modalities. This included listening to people talk, responding to queries, and analyzing and describing images, demonstrating what is known as multimodal AI. This breakthrough was widely celebrated. However, it has since been revealed that the video was, in fact, staged and that it does not represent the real capabilities of Google’s Gemini.

… OpenAI, the company behind the groundbreaking ChatGPT, has a history marked by dubious demos and overhyped promises. Its latest release, Chat GPT-4-o, boasted claims that it could score in the 90th percentile on the Unified Bar Exam. However, when researchers delved into this assertion, they discovered that ChatGPT did not perform as well as advertised.[10] In fact, OpenAI had manipulated the study, and when the results were independently replicated, ChatGPT scored on the 15th percentile of the Unified Bar Exam.

… Amazon has also joined the fray. Some of you might recall Amazon Go, its AI-powered shopping initiative that promised to let you grab items from a store and simply walk out, with cameras, machine learning algorithms, and AI capable of detecting what items you placed in your bag and then charging your Amazon account. Unfortunately, we recently learned that Amazon Go was also a fraud. The so-called AI turned out to be nothing more than thousands of workers in India working remotely, observing what users were doing because the computer AI models were failing.

… Facebook introduced an assistant, M, which was touted as AI-powered. It was later discovered that 70% of the requests were actually fulfilled by remote human workers. The cost of maintaining this program was so high that the company had to discontinue its assistant.

… If the question asked doesn’t conform to a previously known example ChatGPT will still produce and confidently explain its answer – even a wrong one.

For instance, the answer to “how many rocks should I eat” was:

…Proponents of AI and large language models contend that while some of these demos may be fake, the overall quality of AI systems is continually improving. Unfortunately, I must share some disheartening news: the performance of large language models seems to be reaching a plateau. This is in stark contrast to the significant advancements made by OpenAI’s ChatGPT, between its second iteration (GPT-2), and the newer GPT-3 – that was a meaningful improvement. Today, larger, more complex, and more expensive models are being developed, yet the improvements they offer are minimal. Moreover, we are facing a significant challenge: the amount of data available for training these models is diminishing. The most advanced models are already being trained on all available internet data, necessitating an insatiable demand for even more data. There has been a proposal to generate synthetic data with AI models and use this data for training more robust models indefinitely. However, a recent study in Nature has revealed that such models trained on synthetic data often produce inaccurate and nonsensical responses, a phenomenon known as “Model Collapse.”

OK, enough of that. These authors have an interesting point of view, and the truth probably lies somewhere between their extreme skepticism and the breathless hype we have been hearing for the last two years. I would guess that the most practical near-term uses of AI may involve some more specific, behind the scenes data-mining for a business application, rather than exactly imitating the way a human would think.

Interpreting candidate policies

Interpreting policy talking points from people running for office is difficult for a variety reasons, but it essentially boils down to the fact that voters often do not want the outcomes that would be produced by the policies they will in fact vote for. Candidates, in turn, must find a way to promise policies they will either do their best not to deliver or, if they do deliver them, said policies will be bundled with other policies that will mitigate their effect.

Interpreting the true intended policy bundle being signaled by a candidate is fraught with traps, not least of which our personal biases. If I want to like a candidate, for social or identity reasons, I will have a tendency to interpret their policy proposals as part of a broader, unspoken, bundle that I like. If I don’t want to like a candidate, perhaps because they are a petty, boorish lout whose principle aptitude appears to be grifting at the margins of legality and leveraging the high transaction costs of our legal system, then I will subconsciously interpret each policy proposed as part of a more insidious unspoken bundle.

How should voters and pundits navigate an environment where information is limited and bias is largely unavoidable? I don’t know, but here’s how I try anyway.

  1. Assume every candidate has basic competency in appealing to their base.
  2. Assume every candidate wants to appeal to the median voter.
  3. Do not assume anyone knows who the median voter is.
  4. Assume both candidates and their advisors have the same capacity to assess how their respective bases will react to a proposal and how it will actually impact them, but do not assume they know how the median voter will react and be affected.

In essence, candidates will always have a deeper familiarity, with greater repeated interactions, with their voter and donor bases. They know how they will react and how they will actually be impacted. Platforms will be designed around navigating contexts where popularity and expected impact are in conflict. What this means is that, in the aggregate,

  1. A candidate stands to do the most damage when advocating for policies that will aid their base at the expense of the median
  2. A candidate will create the most uncertainty when the desires of their base are at odds with the consequences for their base.

For example, assume both major parties are advocating for trade restrictions. Let’s call them the Plurality party and the Majority parties. Trade restrictions will hurt the median voter, full stop. The Plurality party, whose indentity constitutes a minority of the total population but the largest share of the population of any subgroup, stands to gain the most through policies that extract from others in a negative sum game. It will be easier to take their candidate’s policies at face value because of uncertainty around the median voters preferences, in part due to voter uncertainty about how policies will affect them.

The Majority party, on the other hand, is more fractured in the subgroups that constitute its more numerous whole. They can be thought of an encompassing group coping with the high costs of intragroup bargaining. Their greater numerical advantage in elections is partly, if not wholly, nullified by difficulty solving collective action problems and their need to solve positive sum games whose benefits are spread too thinly to excite their base. Further, the Majority party is inclusive of the median voter, about which there is greater uncertainty. The Majority party, as such, has greater incentive to rely on a form of subtextual deception. To win elections, they will need to propose the policies that the various elements their base wants while also bundling them with other policy elements that will mitigate their consequences in the aggregate and leave options open downstream as consquences for the median are made manifest. Interpreting proposals of the Majority party demands more Straussian reading, which also means that greater care is needed in monitoring your own bias. Because all complex political economy aside, sometimes parties do in fact just have bad ideas.

Good luck.

Sticky Prices as Coordination Failure Working Paper

Sticky Prices as Coordination Failure: An Experimental Investigation” is my new paper with David Munro of Middlebury, up at SSRN.

We ask whether coordination failures are a source of nominal rigidities. This was suggested in a recent speech by ECB President Christine Lagarde. She said, “In the recent decades of low inflation, firms that faced relative price increases often feared to raise prices and lose market share. But this changed during the pandemic as firms faced large, common shocks, which acted as an implicit coordination mechanism vis-à-vis their competitors.”

Coordination failure was suggested as a possible cause of price rigidity in a theory paper by Ball and Romer (1991). They demonstrated the possibility for multiple equilibria, and we perform the first laboratory test to observe equilibrium selection in this environment.

We theoretically solve a monopolistically competitive pricing game and show that a range of multiple equilibria emerges when there are price adjustment costs (menu costs). We explore equilibrium selection in laboratory price setting games with two treatments: one without menu costs where price adjustment is always an equilibrium, and one with menu costs where both rigidity and flexibility are possible equilibria.

In plain language, for our general audience, the idea is that the prices you set might depend on what other people are doing. If other people are responding to a shock (for example, Covid driving up labor costs all over town might cause retail prices to rise) then you will, too. If every other store in town is afraid to raise prices, then there is a certain situation where you might resist adjusting your prices, too (price rigidity).

Results: First, when there is only one theoretical equilibrium, subjects usually conform to it. When cost shocks are large, price adjustment is a unique equilibrium regardless of the presence of menu costs, and we see that subjects almost always adjust prices. When cost shocks are small and there are menu costs rigidity is a unique equilibrium and subjects almost never adjust. Conversely, with small cost shocks subjects almost always adjust when there are no menu costs.

The more interesting cases are when the parameters allow for either rigidity or flexibility to be selected. We find that groups do not settle at the rigidity equilibrium. Rather, depending on the specific nature of the shock, between half and 80% of subjects adjust in response to a shock. The intermediate levels of adjustment are represented here in this figure as the red circles that fall between the red and green bands where multiple equilibria are possible.

In the figure above, the red circles are higher when the production cost shock gets further from zero in absolute value. We see that the proportion of subjects adjusting prices is proportional to the size of the cost shocks. This is consistent with the interpretation that the large post-COVID cost shocks acted as an implicit coordination mechanism for firms raising prices. Our results provide a number of interesting insights on nominal rigidities. We document more nuance in the paper regarding heterogeneity and asymmetry. Comments and feedback are appreciated! If it’s not clear from the EWED blog how to email me (Joy), find my professional contact info here. 

Services, and Goods, and Software (Oh My!)

When I was in high school I remember talking about video game consumption. Yes, an Xbox was more than two hundred dollars, but one could enjoy the next hour of that video game play at a cost of almost zero. Video games lowered the marginal cost and increased the marginal utility of what is measured as leisure. Similarly, the 20th century was the time of mass production. Labor-saving devices and a deluge of goods pervaded. Remember servants? That’s a pre-20th century technology. Domestic work in another person’s house was very popular in the 1800s. Less so as the 20th century progressed. Now we devices that save on both labor and physical resources. Software helps us surpass the historical limits of moving physical objects in the real world.


There’s something that I think about a lot and I’ve been thinking about it for 20 years. It’s simple and not comprehensive, but I still think that it makes sense.

  • Labor is highly regulated and costly.
  • Physical capital is less regulated than labor.
  • Software and writing more generally is less regulated than physical capital.


I think that just about anyone would agree with the above. Labor is regulated by health and safety standards, “human resource” concerns, legal compliance and preemption, environmental impact, and transportation infrastructure, etc. It’s expensive to employ someone, and it’s especially expensive to have them employ their physical labor.

Continue reading

Robinhood’s Casino Comps

I just got the new Robinhood Gold credit card after 4 months on their waitlist. It offers 3% cash back on everything- except travel, which is an even better 5%. This seems to be a much better deal than the typical credit card (which offers ~0-1% back in cash or equivalents), and even better than the previous best alternative I know of (the Citi Double Cash, which pays 2% back). So, is there a catch?

As far as I can tell, there are two, but one is minor and the other is avoidable.

The minor catch is that while they advertise the Gold Card as having no annual fee, you need to be a Robinhood Gold member to get it. Robinhood Gold has a $50/year fee, though it comes with other benefits, and getting the extra 1%+ back on the credit card will itself pay for the fee assuming you spend at least $5k/yr on the card.

The potentially major catch, and the reason I assume Robinhood is offering such a good deal, is that they want to entice you to open a brokerage account and to make bad decisions with that account that make them money. Much like a casino that offers you free drinks and cheap hotel rooms in the hope that you will choose to gamble and end up losing way more than the cost of the “complimentary” things they gave you. This is a major risk, but if you know what to avoid you can still come out ahead. The last time my friends dragged me to a casino I got handed plenty of free drinks despite the fact that I never gambled. Similarly, Robinhood might nudge its users to lose money in ways large (options) and small (overtrading with market orders).

But while Robinhood’s interface might suggest these bad choices, it absolutely does not require them. You can simply choose not to enable options trading, not to over-trade (and to turn off price alerts that nudge you to do so), and to use limit orders instead of the default market orders when buying stocks. In fact, you could avoid using Robinhood to buy stocks altogether, and simply use their brokerage account as a way to earn 5% interest while using it to pay off your credit card (though on the other hand, Robinhood could benefit people if it nudges them to do stock investing at all instead of keeping everything in a checking account).

The fact that Robinhood Gold brokerage accounts pay 5% interest on uninvested cash is its other big advantage. You can find savings accounts elsewhere paying 5% or a bit more, but many won’t maintain that rate, and they have transaction limits. Robinhood also pays a 1% bonus on cash transferred in if you keep it there.

Someone moving to the Robinhood ecosystem from a bad setup (paying with cash, or debit cards, or credit cards with no rewards that are paid off from a checking account that earns 0%) could in theory increase their real spending power by 8%+. Even someone in a more common situation (has a 1% rewards card but most of their spending is on things like mortgages that aren’t credit-card-eligible, pays the credit card from a 0% interest checking account but sweeps excess cash to a high-yield savings account paying 4%) could still increase their total spending power 1-3%. Not huge, but a big deal for something that can be set up for less than a days work.

This is now the best single-account setup I know of- assuming you can stay out of their casino. Churning through different accounts can get you a better return, but it is also a lot more work and has its own risks. If you want to up your returns some without the fees or risk of the Robinhood ecosystem, then something like the Citi Double Cash paid from a high-yield (4%+) savings account is probably the way to go.

Disclaimer: I might be wrong about this but if so I am honestly wrong; this post is not sponsored and I’m not even using referral links when I easily could. Still, do your own research and let me know if I’ve missed anything

Update: Robinhood CEO Vlad Tenev did an interview on Invest Like the Best this week where, reading between the lines, he confirms both the positive and negative things I say here. They make most of their money overall on options and active traders; 3% cash back exceeds the interchange fees they get from merchants, but they expect the card to be profitable because some users will carry a balance (and pay interest) and because it will push people to sign up for Gold (so pay fees and perhaps trade more). He notes that there is another card that offers 3% cash back, but it is only available to those with at least $2 million managed by Fidelity.

On Average, American Wage Earners are Better Off Than They Were Four Years Ago

As I wrote last November, the question “are you better off than you were four years ago?” is a common benchmark for evaluating Presidential reelection prospects. And even though Biden is no longer running for reelection, voters will no doubt be considering the economic performance of his first term when thinking about their vote in November.

The good news for American wage earners (and possibly Harris’ election prospects) is that average wages have now outpaced average price inflation since January 2021. Despite some of that time period containing the worst price inflation in a generation, wages have continued to grow even as price growth has moderated. Key chart:

For most of Biden’s term, it was true that prices had outpaced wages. But no longer.

The real growth in wages, admittedly, is not very robust, despite being slightly positive. How does this compare to past performance under recent Presidents? Surprisingly, pretty well! (Lots of caveats here, but this is what the raw data shows.)

Behind Last Week’s Stock Minicrash: Unwind of the Yen Carry Trade

Last Monday, August 5, the S&P 500 crashed by 3.5% from the previous close. That is a huge daily move, which seems to have been a surprise to most market watchers. The VIX index, a measure of the cost of options and widely seen as a measure of fear in the markets, went off the charts that day. What happened?

The previous week, there was an employment report that showed higher than expected jobless claims. Although that led to angst over a recession, a genuine serious dent in employment would bring the Fed roaring in with interest rate cuts, and the stock market loves rate cuts. In addition, as we have highlighted in recent posts (here and here), there is increasing skepticism that the monster spends on AI will produce the profits that Big Tech hopes. However, the AI skepticism and the employment worries seemed already baked into stock prices by the Friday close.

What apparently happened over the weekend was the unwinding of a big part of the yen carry trade.

What is that, you ask? To frame this, imagine you have $100 to invest in something very safe, like short term Treasury securities. In the simplest case, you go buy a 1-year T-bill which yields 4.5%. You will make $ 4.50 in a year, from this transaction. If you had $100 million to invest, you would make $ 4.5 million.

Now suppose that you could use that $100 as collateral to borrow $1000 at 0.05%. You then take that $1000 and buy $1000 worth of 4.5% T-bills. Voila, instead of making a measly $ 4.50, you can now make  1000*(4.5% – 0.05%) = $44.5. This is nearly ten times as much, a 44.5% return on your $100. Financial alchemy at its finest!

Now, if instead of investing in boring 4.5% T-bills, you had been buying Microsoft and Apple shares (up 25% and 21%, respectively, in the past twelve months), just imagine the profits from this 10X leveraged trade. Especially if you started with a $100 million hedge fund instead of $100.

Where, you may ask, could you borrow money at 0.05%? The answer is Japan. The central bank there has kept rates essentially zero for many years, for reasons we will not canvass here. This scheme of borrowing in yen, and investing (mainly in the US) in dollars is termed the yen carry trade. Besides this borrowing/investing, simply betting that the Japanese yen would decline against the dollar has been profitable for the past 18 months.

What could possibly go wrong with such a scheme? Well, you have to do this borrowing in Japanese yen. So, if you borrow in yen and then convert it to dollars and invest in the dollar world, you can be in a world of hurt if the value of yen in dollars goes up by the time you need to close out this whole trade (i.e. cash in your T-Bills into dollars, convert back to yen, and pay off your yen borrowings.

What happened on Wednesday, July 31 was the Bank of Japan unexpectedly raised its key interest rate target from 0-0.1% to around 0.25%, and announced they would scale back their QE bond-buying, in an effort to address inflation. As may be expected, that raised the value of the yen on Thursday and Friday, though not by much. But the yen made a surge up at the end of Friday’s trading.

Apparently, that caused enough angst in the carry trade community that participants in the carry trade started running for the exits, selling dollar-denominated assets (including stocks) and scrambling to buy yen. Naturally, that shot the price of yen up even more, so on Monday, Aug 5, we had a disorderly market rout.

Bad news sells, and so all the finance headlines on Monday were blaring about the stock price collapse and start of an awful bear market. However, nothing substantive had really changed. By Friday, the S&P 500 had recovered from this big head-fake.

As usual, investors sold stocks (at a low price) on Monday, and presumably bought them back at a higher price later in the week. This is why the average investor’s returns fall well below a simple buy and hold. But that is another subject for another time.

Insuring the suspension of disapproval

My wife and I were watching Guy Ritchie’s “Sherlock Holmes” (2009) last night. There is a chase/fight scene where a large commercial ship being repaired along a dock is detroyed as collateral damage. She asked me “What are the consequences of that ship being destroyed?” I had to admit that I didn’t fully know the state of the insurance market in Victorian England, but suffice it to say a few businesses and/or families were likely ruined. Which led to a conversation about collateral damage outside of the main narrative in movies and insurance. Sorry, that’s just what happens when you marry an economist. Things to consider the next time you’re filtering your prospects.

Which got me thinking: how much does the suspension of our disapproval of the protagonist’s actions (similar to the suspension of disbelief) depend on our undeclared faith in the insurance market of a fictional world? We don’t worry about destroyed livelihoods because we assume everything is simply absorbed as a tail event against which everything is insured. Car through front window? Automotive insurance tail event. Plane crashing onto the Vegas strip? Aeronautical tail event. Godzilla’s tail sweeping through a city? Giant lizard tail tail event.

How about the rise of the antihero? How many heists include a character shouting exposition to a crowd of cowering bank customers that they are there to steal money from the insurance company rather than the customers? The filmmaker needs the audience to suspend disbelief that bank’s have multiple customers inside in the age smart phones and suspend disapproval of the morality of the thieves’ actions as they steal from what they can only hope the audience will deem a souless corporation that can absorb the loss without broader consequence.

There’s two intellectual rabbit holes you can go down when you start thinking about insurance. You can dive in vertically, asking how much of our daily lives, including the consumption of narratives, is dependent on the presumption of insurance. You can also start thinking horizontally: how many dimensions of our lives boil down to creating formal and informal sources of insurance. We acquire formal health, home, pet, and automotive insurance. We also join groups, like churches, synagogues, mosques, and (yes) cults for social insurance. One motive to have children is to insure against the limitations and isolation of old age. Anything and everything we invest in, both individually and as a society, that softens the tail events at the expense of the expected outcome is a form of insurance.

It can go on and on. If anything, it takes care at some point to stop seeing everything as a form of insurance. Why did they feature an actor in the poster and the trailer despite their only appearing for 14 minutes in the film? Why were they paid more than double the lead actors? Oh, right. They’re an insurance policy against a catastrophic opening weekend. If the movie is good, but needs word of mouth to spread for people to starting coming out, you need to survive to a second weekend to start making money. Better to eat a chunk of your expected profits on a big name than risk getting dumped from theaters before the audience can find you.

What’s that you say? No one goes to theater’s anymore? Oh. Well, there’s some risk you can’t insure against.

Publish or Perish: A Hilarious Card Game Based on Academia

I had the opportunity to play an advanced copy of “Publish or Perish,” a new card game that satirizes the world of academia. Created by Max Bai, this game offers a funny take on the often cutthroat world of academic publishing.

Official website for the game: here

My group of eight friends divided into teams to accommodate the game’s six-player limit, which I’d recommend not exceeding. From the moment we started reading the instructions aloud, we were laughing.

The gameplay is engaging. One unexpectedly hilarious rule involves clapping for each other’s achievements. The game’s core revolves around publishing manuscripts, accumulating citations, and navigating the waters of peer review and academic politics.

I was impressed by the calibration of the trivia questions. They struck a great balance – challenging enough that we often couldn’t answer them, yet not so obscure that they felt unreasonable. This aspect added an educational twist to the fun, sparking interesting discussions.

The humor in “Publish or Perish” is spot-on, especially in the details. The manuscript cards had us in stitches, with journal names like “Chronicle of Higher Walls” (a clever play on the real “Chronicle of Higher Education”) and absurd paper titles.

My favorite paper title was “The Great Avocado Toast Crisis: Socioeconomic Impacts of Millennial Breakfast Choices”
Esteemed friend and economist Vincent Geloso liked “The Economics of Building a Death Star”

The two other full-time academics in our group were so impressed that they pre-ordered copies on the spot. While the game is probably most enjoyable with at least one academic in the group, our mixed party – including a government statistician and several non-academics – found it entertaining.  One of my non-academic friends summed it up as follows: “This game brought several people from different backgrounds and areas of expertise together for a thoroughly enjoyable evening.”

“Publish or Perish” manages to be both easy to learn and refreshingly original. I predict it will carve out its own niche with its unique theme and mechanics. Players can engage in academic shenanigans like plagiarism, P-value hacking, and even sabotaging opponents’ work – all in good fun.

Continue reading

Recession Prospecting & Fed Tea Leaves

Will a recession happen? It’s famously hard/impossible to predict. Personally, I have a relatively monetarist take. I consider the goals of the Federal reserve, what tools they have, and how they make their decisions. I also think about the very recent trend in the macroeconomy and how it’s situated relative to history. Right now, the yield curve has been inverted for quite some time and the Sahm rule has been satisfied, both are historical indicators of recession.

Recessions are determined by the NBER’s Business Cycle Dating Committee. They always make their determination in hindsight and almost never in real time. They look at a variety of indicators and judge whether each declines, for how long, how deeply, and the breadth of decline across the economy. So plenty of ‘bad’ things can happen without triggering a recession designation.

In my expert opinion, recessions can largely be prevented by maintaining expected and steady growth in NGDP. This won’t solve real sectoral problems, but it will help to prevent contagion and spirals.  The Fed can control NGDP to a great degree. In doing so, they can affect unemployment and growth in the short run, and inflation in the medium to long run.

One drawback of the NGDP series is that it’s infrequent, published only quarterly. It’s hard to know whether a dip is momentary, a false signal that will later be updated, or whether there is a recession coming. So, what should one examine? One could examine leading indicators or the various high-frequency indicators of economic activity. But those are a little too much like tarot cards and fortune telling for my taste.

Continue reading