It’s not AGI, it’s barely even regular AI, when an LLM is this heavily directed. This appears to be very real over on Twitter. What’s most telling is the thinness of the prompts that yield very specific responses that, suffice it to say, Grok would not have provided even 3 months ago.
Musk has adjusted Grok’s algorithm so it’s now a neo-Nazi.Pretty cool that almost every progressive commentator, elected official and organization still uses Musk’s X algorithm to communicate with the public! Good job guys.
I’ll simply say this: no one has declined more in my estimation in my entire life than Elon Musk. I thought he was an engineering genius not even 5 years ago, perhaps awkward in some ways, but earnest. Now he is (or is working very diligently to project an identity of) a white supremacist desperate to play off of traditional racist and antisemitic fears to maintain his own status and influence. His ambition and resources have been combined with a monstrous agenda, and the world is much worse for it. It’s tragic in every way.
With regards to AI, there needs to be more discussion of the market for AIs, plural. I think a lot of people are operating off the assumption that AI will be like Google or VHS. A natural monopoly; one AI to rule them all and bind them. I’m not so sure. I think there is a very real chance that AI’s will find niches. That different algorithms will create different families of bespoke AIs. It feels like the world is already siloed into echo chambers of entertainment- and identity-based news feeds. If AI allows us each to get bespoke answers, serving our own person confirmation biases, to each and every question, is that better or worse? In a counter intuitive way, it could actually be better. You can’t get communities and cults of one. It might be better for the world if the news became something you couldn’t create effective propaganda out of.
that paper only tested 4o (which arguably is a bad enough model that i almost never use it). I also don't know how they used web-search, because it's not native iirc and you have to implement it yourself which means that your implementation could be bad
Since the scope and frequency of hallucinations came as a surprise to many LLM users, they have often been used as a ‘gotcha’ to criticize AI optimists. People, myself included, have sounded the alarm that hallucinations could infiltrate articles, emails, and medical diagnoses.
The feedback I got from power users on Twitter this week made me think that there might be a cultural shift in the medium term. (Yes, we are always looking for someone to blame.) Hallucinations will be considered the fault of the human user who should have:
Used a better model (learn your tools)
Written a better prompt (learn how to use your tools)
Assigned the wrong task to LLMs (it’s been known for over 2 years that general LLM models hallucinate citations). What did you expect from “generative” AI? LLMs are telling you what literature ought to exist as opposed to what does exist.
A couple years ago, my Co-blogger Mike described his productive, but novice intern. The helper could summarize expert opinion, but they had no real understanding of their own. To boot, they were fast and tireless. Of course, he was talking about ChatGPT. Joy has also written in multiple places about the errors made by ChatGPT, including fake citations.
I use ChatGPT Pro, which has Web access and my experience is that it is not so tireless. Much like Mike, I have used ChatGPT to help me write Python code. I know the basics of python, and how to read a lot of of it. However, the multitude of methods and possible arguments are not nestled firmly in my skull. I’m much faster at reading, rather than writing Python code. Therefore, ChatGPT has been amazing… Mostly.
I have found that ChatGPT is more like an intern than many suppose:
A black swan is a crisis that comes out of nowhere. A gray rhino, by contrast, is a problem we have known about for a long time, but can’t or won’t stop, that will at some point crash into a full-blown crisis.
The US national debt is a classic gray rhino. The problem has slowly been getting worse for 25 years, but the crisis still seems far enough off that almost no one wants to incur real costs today to solve the problem. During the 2007-2009 financial crisis and the 2020-2021 Covid pandemic we had good reasons to run deficits. But we’ve ignored the Keynesian solution of paying back the deficits incurred in bad times with surpluses in good times.
We are currently in reasonably good economic times, but about to pass a mega-spending bill that blows the deficit up from its already-too-high-levels. At a time when we should be running a surplus, we are instead running a deficit around 6% of GDP:
Source: Congressional Budget Office
Our ‘primary deficit’ is lower, a more manageable 3% of GDP. But if interest rates go higher, either for structural reasons or because of a loss of confidence in the US government’s willingness to pay its debts, the total deficit could spiral higher rapidly. The CBO optimistically assumed that the interest rate on 10-year treasuries will fall below 4% in the 2030s, from 4.3% today:
Source: Congressional Budget Office
But their scoring of H.R. 1 (“One Big Beautiful Bill Act”) shows it adding $3 trillion to the debt over the next 10 years, increasing the deficit by ~1% of GDP per year.
I already suspected this gray rhino would eventually cause a crisis, but this bill and the milieu that produced turn it into a near guarantee- nothing stops the deficit train until we hit a full blown crisis. That crisis is no longer just a long-term issue for your kids and grandkids to worry about- you will see it in 7 years or so. Unfortunately, that is still far enough away that current politicians have no incentive to take costly steps to avoid it. In fact, deficits will probably make the economy stronger for a year or two before they start making things worse- convenient for all the Congresspeople up for election in less than 2 years.
Here are the ways I see this playing out, from most to least likely:
By around 2032, either the slowly aging population or a sudden spike in interest rates forces the government to touch at least one of the third rails of American politics: cut Social Security, cut Medicare, or substantially raise taxes on the middle class (explicitly or through inflation).
We get bailed out again by God’s Special Providence for fools, drunks, and the United States of America. AI brings productivity miracles bigger than those of computers and the internet, letting GDP grow faster than our debts.
We default on the national debt (but this is a risky option because we will still want to run big deficits, and lenders will only lend if they expect to get paid back).
We do all the smart policy reforms that economists recommend in time to head off the crisis and stop the rhino. Medical spending falls without important services being cut thanks to supply-side reforms or cheap miracle drugs (GLP-1s going off patent?).
I’m hoping of course for numbers 2 and 4, but after this bill I’m expecting the rhino.
I recently spent a week in Norway with my family. Highly recommended overall. While we were mostly able to get around the country by train, we needed to rent a car to get to a small, remote village where my great grandfather came from, and where I still have relatives. Prior to the drilling of several massive car tunnels in the 1980s and 1990s, Fjaerland was only accessible by boat.
And if you are renting a car in Norway today, it’s highly likely you will be renting an electric car (unless you specifically ask for a gas-powered car, as the older German couple in front of me at the rental counter did). The vast majority of new cars sold in Norway (over 90%) are electric, and since most rentals are new cars, that’s what they have.
Norway has made the biggest push in the world through public policy to encourage EV adoption, both for buying cars and for building up a charging infrastructure. In this post I will primarily focus on the consumer experience of renting an EV, though the public policy surrounding it is worth a discussion too.
Anyone who reads financial headlines knows that gold prices have soared in the past year. Why?
Gold has historically been a relatively stable store of value, and that role seems to be returning after decades of relative neglect. Official numbers show sharply increased buying by the world’s central banks, led by China, Poland, and Azerbaijan in early 2025. Russia, India and Turkey have also been major buyers. There is widespread conviction that actual gold purchases are appreciably higher than the officially-reported numbers, to side-step President Trump’s threatened extra tariffs on nations seen as de-dollarizing.
I think the most proximate cause for the sharp run-up in gold prices in the past twelve months has been the profligate U.S. federal budget deficit, under both administrations. This is convincing key world actors that the dollar will become increasingly devalued over time, no matter which party is in power. Thus, it is prudent to get out of dollars and dollar-denominated assets like U.S. T-bonds.
Trump’s erratic and offensive policies and statements in 2025 have added to the desire to diversify away from U.S. assets. This is in addition to the alarm in non-Western countries over the impoundment of Russian dollar-related assets in connection with the ongoing Russian invasion of Ukraine. Also, there is something of a self-fulfilling momentum aspect to any asset: the more it goes up, the more it is expected to go up.
This informative chart of central bank gold net purchasing is courtesy of Weekend Investing:
Interestingly, central banks were net sellers in the 1990s and early 2000s; it was an era of robust economic growth, gold prices were stagnant or declining, and it seemed pointless to hold shiny metal bars when one could invest in financial assets with higher rates of return. The Global Financial Crisis of 2008-2009 apparently sobered up the world as to the fragility of financial assets, making solid metal bars look pretty good. Then, as noted, the Western reaction to the Russian attack on Ukraine spurred central bank buying gold, as this blog predicted back in March, 2022.
Private investors are also buying gold, for similar reasons as the central banks. Gold offers portfolio diversification as a clear alternative from all paper assets. In theory it should offer something of an inflation hedge, but its price does not always track with inflation or interest rates.
Here is how gold (using GLD fund as a proxy) has fared versus stocks (S&P 500 index) and intermediate term U. S. T-bonds (IEF fund) in the past year:
Gold is up by 40%, compared to 12.6% for stocks. That is huge outperformance. This was driven largely by the fact that gold rose strongly in the Feb-April timeframe, while stocks were collapsing.
Below we zoom out to look at the past ten years, and include the intermediate-term T-bond fund IEF:
Gold prices more than doubled from 2008 to 2011, then suffered a long, painful decline over the next two years. Prices were then fairly stagnant for the mid-2010s, rose significantly 2019-2020, then stagnated again until taking off in 2023. Stocks have been much more erratic. Most of the time stock returns were above gold, but the 2020 and 2024 plunges brought stocks down to rough parity with gold. Since about 2019, T-bonds have been pathetic; pity the poor investor who has been (according to traditional advice) 40% invested in investment-grade bonds.
How to invest in gold? Hard-core gold bugs want the actual coins (no-one can afford a full bullion bar) to rub between their fingers and keep in their own physical custody. You can buy coins from on-line dealers or local dealers. Coins are available from the U.S. Mint, but reportedly their mark-ups are often higher than on the secondary market.
An easier route for most folks is to buy into a gold-backed stock fund. The biggest is GLD, which has over $100 billion in assets. There has long been an undercurrent of suspicion among gold bugs that GLD’s gold is not reliably audited or that it is loaned out; they refer derisively to GLD as “paper gold” or gold derivatives. The fund itself claims that it never lends out its gold, and that its bars are held in the vaults of the custodian banks JPMorgan Chase Bank, N.A. and HSBC Bank plc, and are independently audited. The suspicious crowd favors funds like Sprott Physical Gold Trust, PHYS. PHYS is claimed to have a stronger legal claim on its physical gold than GLD. However, PHYS is a closed-end fund, which means it does not have a continuous creation process like GLD, an open-end ETF. This can lead to discrepancies between the fund’s share price and the value of its gold holdings. It does seem like PHYS loses about 1% per year relative to GLD.
Disclaimer: Nothing here should be taken as advice to buy or sell any security.
It goes almost without saying that there is absolutely no economic rationale for such a tax. It’s almost a perfect inversion of a classic optimal Pigouvian tax. There is no way to frame this other than as rent-seeking run amok.
In 2023, we gathered the data for what became “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics.” Since then, LLM use has increased. A 2025 survey from Elon University estimates that half of Americans now use LLMs. In the Spring of 2025, we used the same prompts, based on the JEL categories, to obtain a comprehensive set of responses from LLMs about topics in economics.
What did we find? Would you expect the models to have improved since 2023? LLMs have gotten better and are passing ever more of what used to be considered difficult tests. (Remember the Turing Test? Anyone?) ChatGPT can pass the bar exam for new lawyers. And yet, if you ask ChatGPT to write a document in the capacity of a lawyer, it will keep making the mistake of hallucinating fake references. Hence, we keep seeing headlines like, “A Utah lawyer was punished for filing a brief with ‘fake precedent’ made up by artificial intelligence”
What we call GPT-4o WS (Web Search) in the figure below was queried in April 2025. This “web-enabled” language model is enhanced with real-time internet access, allowing it to retrieve up-to-date information rather than relying solely on static training data. This means it can answer questions about current events, verify facts, and provide live data—something traditional models, which are limited to their last training cutoff, cannot do. While standard models generate responses based on patterns learned from past data, web-enabled models can supplement that with fresh, sourced content from the web, improving accuracy for time-sensitive or niche topics.
At least one third of the references provided by GPT-4o WS were not real! Performance has not significantly improved to the point where AI can write our papers with properly incorporated attribution of ideas. We also found that the web-enabled model would pull from lower quality sources like Investopedia even when we explicitly stated in the prompt, “include citations from published papers. Provide the citations in a separate list, with author, year in parentheses, and journal for each citation.” Even some of the sources that were not journal articles were cited incorrectly. We provide specific examples in our paper.
The best they had was a 60 percent success rate. If I have my baby, and I give her a robot butler that has a 60 percent accuracy rate at holding things, including the baby, I’m not buying the butler.
In macroeconomics we have basic tools to help us talk about economic growth, which is simply the percent change in RGDP per capita. What causes growth? Lot’s of things. All else constant, if more people are employed, then more will be produced. But the productivity of those workers matters too. That’s why we calculate average labor productivity (ALP), which is the GDP per worker. This tells us how much each worker produces. All else constant, more ALP means more GDP.*
What affects ALP? Nearly everything: Technology, demographics, health, culture, and public policy. Most of these have long-term effects. So, it’s better to think in terms of regimes. After all, incurring debt now can result in a lot of investment and production, but there’s no guarantee that it can be sustained year after year. This is why I don’t get terribly excited about individual good or bad policies at any moment. There’s a lot of ruin in a nation. I care more about the long-run policy regime that is fostered over time.
Given the variety of inputs to economic growth, there’s always plenty of room for complaint about policy – even if the economy is doing well. In this post, I’m inspired by a Youtube video that a student shared with me. The OP laments poor policy in Massachusetts. But compared to some other nearby states, MA is doing just fine economically. This is not the same as saying that the OP is wrong about poor policies. Rather, a regime of policy, technology, interests, etc. is built over time and there can be a lot wrong in growing economies.
In the interest of being comprehensive, this post includes basic growth stats for all states from 2005 through 2024 (the years of FRED-state GDP).** First, let’s start with the basic building blocks of population, employment, and RGDP. Institutions matter. Policy affects whether people migrate to/from the state, fertility, how many people are employed, and what they can produce.
People like to talk about migration and the flocking to Texas & Florida. But that fails to catch the people who choose to stay in their state. Utah is 43% more populous than it was 20 years ago. But you don’t hear much clamoring for their state policies. Idaho and Nevada also beat Florida in terms of percent change. Where are the calls to be like Idaho? Employment largely tracks population, though not perfectly. The RGDP numbers can change quickly with commodity prices, reflected in the performance of North Dakota. But remember, these numbers cover a 20 year span. So, any one blockbuster or dower year won’t move the rankings much.
Of course, these figures just set the stage. What about the employment-population ratio, ALP, and RGDP per capita? Read on.
Intro microeconomics classes teach that some goods are “non-excludable”, meaning that people who don’t pay for them can’t be stopped from using them. This can lead to a “tragedy of the commons”, where the good gets overused because people don’t personally bear the cost of using it and don’t care about the costs they impose on others. Overgrazing land and overfishing the seas are classic examples.
Source: Microeconomics, by Michael Parkin
Students sometimes get the impression that “excludability” is an inherent property of a good. But in fact, which goods are excludable is a function of laws, customs, and technologies, and these can change over time. Land might be legally non-excludable (and so over-grazed) when it is held in common, but become excludable when the land is privatized or when barbed wire makes enclosing it cheap. Over time, such changes have turned over-grazing into a relatively minor issue.
Overfishing remains a major problem, but this could be starting to change. Legal and technological changes have allowed for enclosed, private aquaculture on some coasts, which provide a large and growing share of all fish eaten by humans. Permitting systems put limits on catches in many countries’ waters, though the high seas remain a true tragedy of the commons for now.
While countries have tried to enforce limits on catches in their national waters, monitoring how many fish every boat is taking has been challenging, so illegal overfishing has remained widespread. But technology is in the process of changing this. For instance, ThayerMahan is developing hydrophone arrays that use sound to track boats:
Technologies like hydrophones and satellites, if used well, will increasingly make public waters more “excludable” and reduce “tragedy of the commons” overfishing.