Meta Is Poaching AI Talent With $100 Million Pay Packages; Will This Finally Create AGI?

This month I have run across articles noting that Meta’s Mark Zuckerberg has been making mind-boggling pay offers (like $100 million/year for 3-4 years) to top AI researchers at other companies, plus the promise of huge resources and even (gasp) personal access to Zuck, himself. Reports indicate that he is succeeding in hiring around 50 brains from OpenAI (home of ChatGPT), Anthropic, Google, and Apple. Maybe this concentration of human intelligence will result in the long-craved artificial general intelligence (AGI) being realized; there seems to be some recognition that the current Large Language Models will not get us there.

There are, of course, other interpretations being put on this maneuver. Some talking heads on a Bloomberg podcast speculated that Zuckerberg was using Meta’s mighty cash flow deliberately to starve competitors of top AI talent. They also speculated that (since there is a limit to how much money you can possibly, pleasurably spend) – – if you pay some guy $100 million in a year, a rational outcome would be he would quit and spend the rest of his life hanging out at the beach. (That, of course, is what Bloomberg finance types might think, who measure worth mainly in terms of money, not in the fun of doing cutting edge R&D).

I found a thread on reddit to be insightful and amusing, and so I post chunks of it below. Here is the earnest, optimist OP:

andsi2asi

Zuckerberg’s ‘Pay Them Nine-Figure Salaries’ Stroke of Genius for Building the Most Powerful AI in the World

Frustrated by Yann LeCun’s inability to advance Llama to where it is seriously competing with top AI models, Zuckerberg has decided to employ a strategy that makes consummate sense.

To appreciate the strategy in context, keep in mind that OpenAI expects to generate $10 billion in revenue this year, but will also spend about $28 billion, leaving it in the red by about $18 billion. My main point here is that we’re talking big numbers.

Zuckerberg has decided to bring together 50 ultra-top AI engineers by enticing them with nine-figure salaries. Whether they will be paid $100 million or $300 million per year has not been disclosed, but it seems like they will be making a lot more in salary than they did at their last gig with Google, OpenAI, Anthropic, etc.

If he pays each of them $100 million in salary, that will cost him $5 billion a year. Considering OpenAI’s expenses, suddenly that doesn’t sound so unreasonable.

I’m guessing he will succeed at bringing this AI dream team together. It’s not just the allure of $100 million salaries. It’s the opportunity to build the most powerful AI with the most brilliant minds in AI. Big win for AI. Big win for open source

And here are some wry responses:

kayakdawg

counterpoint 

a. $5B is just for those 50 researchers, loootttaaa other costs to consider

b. zuck has a history of burning big money on r&d with theoretical revenue that doesnt materialize

c. brooks law: creating agi isn’t an easily divisible job – in fact, it seems reasonable to assume that the more high-level experts enter the project the slower it’ll progress given the communication overhead

7FootElvis

Exactly. Also, money alone doesn’t make leadership effective. OpenAI has a relatively single focus. Meta is more diversified, which can lead to a lack of necessary vision in this one department. Passion, if present at the top, is also critical for bleeding edge advancement. Is Zuckerberg more passionate than Altman about AI? Which is more effective at infusing that passion throughout the organization?

….

dbenc

and not a single AI researcher is going to tell Zuck “well, no matter how much you pay us we won’t be able to make AGI”

meltbox

I will make the AI by one year from now if I am paid $100m

I just need total blackout so I can focus. Two years from now I will make it run on a 50w chip.

I promise

Hallucination as a User Error

You don’t use a flat head screwdriver to drill a hole in a board. You should know to use a drill.

I appreciate getting feedback on our manuscript, “LLM Hallucination of Citations in Economics Persists with Web-Enabled Models,” via X/Twitter. @_jannalulu wrote: “that paper only tested 4o (which arguably is a bad enough model that i almost never use it).”

Since the scope and frequency of hallucinations came as a surprise to many LLM users, they have often been used as a ‘gotcha’ to criticize AI optimists. People, myself included, have sounded the alarm that hallucinations could infiltrate articles, emails, and medical diagnoses.

The feedback I got from power users on Twitter this week made me think that there might be a cultural shift in the medium term. (Yes, we are always looking for someone to blame.) Hallucinations will be considered the fault of the human user who should have:

  1. Used a better model (learn your tools)
  2. Written a better prompt (learn how to use your tools)
  3. Assigned the wrong task to LLMs (it’s been known for over 2 years that general LLM models hallucinate citations). What did you expect from “generative” AI? LLMs are telling you what literature ought to exist as opposed to what does exist.

My Perfunctory Intern

A couple years ago, my Co-blogger Mike described his productive, but novice intern. The helper could summarize expert opinion, but they had no real understanding of their own. To boot, they were fast and tireless. Of course, he was talking about ChatGPT. Joy has also written in multiple places about the errors made by ChatGPT, including fake citations.

I use ChatGPT Pro, which has Web access and my experience is that it is not so tireless. Much like Mike, I have used ChatGPT to help me write Python code. I know the basics of python, and how to read a lot of of it. However, the multitude of methods and possible arguments are not nestled firmly in my skull. I’m much faster at reading, rather than writing Python code. Therefore, ChatGPT has been amazing… Mostly.

I have found that ChatGPT is more like an intern than many suppose:

Continue reading

Counting Hallucinations by Web-Enabled LLMs

In 2023, we gathered the data for what became “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics.” Since then, LLM use has increased. A 2025 survey from Elon University estimates that half of Americans now use LLMs. In the Spring of 2025, we used the same prompts, based on the JEL categories, to obtain a comprehensive set of responses from LLMs about topics in economics.

Our new report on the state of citations is available at SSRN: “LLM Hallucination of Citations in Economics Persists with Web-Enabled Models

What did we find? Would you expect the models to have improved since 2023? LLMs have gotten better and are passing ever more of what used to be considered difficult tests. (Remember the Turing Test? Anyone?) ChatGPT can pass the bar exam for new lawyers. And yet, if you ask ChatGPT to write a document in the capacity of a lawyer, it will keep making the mistake of hallucinating fake references. Hence, we keep seeing headlines like, “A Utah lawyer was punished for filing a brief with ‘fake precedent’ made up by artificial intelligence

What we call GPT-4o WS (Web Search) in the figure below was queried in April 2025. This “web-enabled” language model is enhanced with real-time internet access, allowing it to retrieve up-to-date information rather than relying solely on static training data. This means it can answer questions about current events, verify facts, and provide live data—something traditional models, which are limited to their last training cutoff, cannot do. While standard models generate responses based on patterns learned from past data, web-enabled models can supplement that with fresh, sourced content from the web, improving accuracy for time-sensitive or niche topics.

At least one third of the references provided by GPT-4o WS were not real! Performance has not significantly improved to the point where AI can write our papers with properly incorporated attribution of ideas. We also found that the web-enabled model would pull from lower quality sources like Investopedia even when we explicitly stated in the prompt, “include citations from published papers. Provide the citations in a separate list, with author, year in parentheses, and journal for each citation.” Even some of the sources that were not journal articles were cited incorrectly. We provide specific examples in our paper.

In closing, consider this quote from an interview with Jack Clark, co-founder of Anthropic:

The best they had was a 60 percent success rate. If I have my baby, and I give her a robot butler that has a 60 percent accuracy rate at holding things, including the baby, I’m not buying the butler.

Illusions of Illusions of Reasoning

Even since Scott’s post on Tuesday of this week, a new response has been launched titled “The Illusion of the Illusion of the Illusion of Thinking

Abstract (emphasis added by me): A recent paper by Shojaee et al. (2025), The Illusion of Thinking, presented evidence of an “accuracy collapse” in Large Reasoning Models (LRMs), suggesting fundamental limitations in their reasoning capabilities when faced with planning puzzles of increasing complexity. A compelling critique by Opus and Lawsen (2025), The Illusion of the Illusion of Thinking, argued these findings are not evidence of reasoning failure but rather artifacts of flawed experimental design, such as token limits and the use of unsolvable problems. This paper provides a tertiary analysis, arguing that while Opus and Lawsen correctly identify critical methodological flaws that invalidate the most severe claims of the original paper, their own counter-evidence and conclusions may oversimplify the nature of model limitations. By shifting the evaluation from sequential execution to algorithmic generation, their work illuminates a different, albeit important, capability. We conclude that the original “collapse” was indeed an illusion created by experimental constraints, but that Shojaee et al.’s underlying observations hint at a more subtle, yet real, challenge for LRMs: a brittleness in sustained, high-fidelity, step-by-step execution. The true illusion is the belief that any single evaluation paradigm can definitively distinguish between reasoning, knowledge retrieval, and pattern execution.

As am writing a new manuscript about hallucination of web-enabled models, this is close to what I am working on. Conjuring up fake academic references might point to a lack of true reasoning ability.

Do Pro and Dantas believe that LLMs can reason? What they are saying, at least, is that evaluating AI reasoning is difficult. In their words, the whole back-and-forth “highlights a key challenge in evaluation: distinguishing true, generalizable reasoning from sophisticated pattern matching of familiar problems…”

The fact that the first sentence of the paper contains the bigram “true reasoning” is interesting in itself. No one dobuts that LLMs are reasoning anymore, at least within their own sandboxes. Hence there have been Champagne jokes going around of this sort:

If you’d like to read a response coming from o3 itself, Tyler pointed me to this:

Papers about Economists Using LLMs

  1. The most recent (published in 2025) is this piece about doing data analytics that would have been too difficult or costly before. Link and title: Deep Learning for Economists

Considering how much of frontier economics revolves around getting new data, this could be important. On the other hand, people have been doing computer-aided data mining for a while. So it’s more of a progression than a revolution, in my expectation.

2. Using LLMs to actually generate original data and/or test hypotheses like experimenters: Large language models as economic agents: what can we learn from homo silicus? and Automated Social Science: Language Models as Scientist and Subjects

3. Generative AI for Economic Research: Use Cases and Implications for Economists

Korinek has a new supplemental update as current as December 2024: LLMs Learn to Collaborate and Reason: December 2024 Update to “Generative AI for Economic Research: Use Cases and Implications for Economists,” Published in the Journal of Economic Literature 61 (4)

4. For being comprehensive and early: How to Learn and Teach Economics with Large Language Models, Including GPT

5. For giving people proof of a phenomenon that many people had noticed and wanted to discuss: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Alert: We will soon have an update for current web-enabled models! It would seem that hallucination rates are going down but the problem is not going away.

6. This was published back in 2023. “ChatGPT ranked in the 91st percentile for Microeconomics and the 99th percentile for Macroeconomics when compared to students who take the TUCE exam at the end of their principles course.” (note the “compared to”): ChatGPT has Aced the Test of Understanding in College Economics: Now What?

References          

Buchanan, J., Hill, S., & Shapoval, O. (2023). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist69(1), 80-87. https://doi.org/10.1177/05694345231218454 (Original work published 2024)

Cowen, Tyler and Tabarrok, Alexander T., How to Learn and Teach Economics with Large Language Models, Including GPT (March 17, 2023). GMU Working Paper in Economics No. 23-18, Available at SSRN: https://ssrn.com/abstract=4391863 or http://dx.doi.org/10.2139/ssrn.4391863

Dell, M. (2025). Deep Learning for Economists. Journal of Economic Literature, 63(1), 5–58. https://doi.org/10.1257/jel.20241733

Geerling, W., Mateer, G. D., Wooten, J., & Damodaran, N. (2023). ChatGPT has Aced the Test of Understanding in College Economics: Now What? The American Economist68(2), 233-245. https://doi.org/10.1177/05694345231169654 (Original work published 2023)

Horton, J. J. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv Preprint arXiv:2301.07543.

Korinek, A. (2023). Generative AI for Economic Research: Use Cases and Implications for Economists. Journal of Economic Literature, 61(4), 1281–1317. https://doi.org/10.1257/jel.20231736

Manning, B. S., Zhu, K., & Horton, J. J. (2024). Automated Social Science: Language Models as Scientist and Subjects (Working Paper No. 32381). National Bureau of Economic Research. https://doi.org/10.3386/w32381

We’re All Magical

The widespread availability and easy user interface of artificial intelligence (AI) has put great power at everyone’s fingertips. We can do magical things.

Before the internet existed we would use books to help us better interpret the world.  Communication among humans is hard. Expressing logic and even phenomena is complex. This is why social skills matter. Among other things, they help us to communicate. The most obvious example of a communication barrier is language. I remember having a pocket-sized English-Spanish dictionary that I used to help me memorize or query Spanish words. The book helped me communicate with others and to translate ideas from one language to another.

Math books do something similar but the translation is English-Math. We can get broader and say that all textbooks are translation devices. They define field-specific terms and ideas to help a person translate among topic domains, usually with a base-language that reaches a targeted generalizability. We can get extreme and say that all books are translators, communicating the content of one person’s head to another.

But sometimes the field-to-general language translation doesn’t work because readers don’t have an adequate grasp of either language. It isn’t necessarily that readers are generally illiterate. It may be that the level of generality and degree of focus of the translation isn’t right for the reader. Anyone who has ever tried to teach anything with math has encountered this.  Students say that the book doesn’t translate clearly, and the communication fails. The book gets the reader’s numeracy or understood definitions wrong. Therefore, there is diversity among readers about how ‘good’ a textbook is.

Search engines are so useful because you can enter some keywords and find your destination, even if you don’t know the proper nouns or domain-specific terms. People used to memorize URLs and that’s becoming less common. Wikipedia is so great because if you want to learn about an idea, they usually explain it in 5 different ways. They tell the story of who created something and who they interacted with. They describe the motivation, the math, the logic, the developments, and usually include examples. Wikipedia translates domain-specific ideas to multiple general languages of different cognitive aptitudes or interests. It scatters links along the way to help users level-up their domain-specific understanding so that they can contextualize and translate the part that they care about.

Historical translation technology was largely for the audience. More recently, translation technology has empowered the transmitters.

Continue reading

EconTalk Extra on Daisy Christodoulou

I wrote an Extra for the How Better Feedback Can Revolutionize Education (with Daisy Christodoulou) episode.

Can Students Get Better Feedback? is the title of my Extra.

Read the whole thing at the link (ungated), but here are two quotes:

For now, the question is still what kind of feedback teachers can give that really benefits students. Daisy Christodoulou, the guest on this episode, offers a sobering critique of how educators tend to give feedback in education. One of her points is that much of the written feedback teachers give is vague and doesn’t actually help students improve. She shares an example from Dylan William: a middle school student was told he needed to “make their scientific inquiries more systematic.” When asked what he would do differently next time, the student replied, “I don’t know. If I’d known how to be more systematic, I would have been so the first time.” 

Christodoulou also turns to the question many of us are now grappling with: can AI help scale meaningful feedback?

Discuss AI Doom with Joy on May 5

If you like to read and discuss with smart people, then you can make a free account in the Liberty Fund Portal. If you listen to this podcast over the weekend: Eliezer Yudkowsky on the Dangers of AI (2023) you will be up to speed for our asynchronous virtual debate room on Monday May 5.

Russ Roberts sums up the doomer argument using the following metaphor:

The metaphor is primitive. Zinjanthropus man or some primitive form of pre-Homo sapiens sitting around a campfire and human being shows up and says, ‘Hey, I got a lot of stuff I can teach you.’ ‘Oh, yeah. Come on in,’ and pointing out that it’s probable that we are either destroyed directly by murder or maybe just by out-competing all the previous hominids that came before us, and that in general, you wouldn’t want to invite something smarter than you into the campfire.

What do you think of this metaphor? By incorporating AI agents into society, are we inviting a smarter being to our campfire? Is it likely to eventually kill us out of contempt or neglect? That will be what we are discussing over in the Portal this week.

Is your P(Doom) < 0.05? Great – that means you believe that the probability of AI turning us into paperclips is less than 5%. Come one come all. You can argue against doomers during the May 5-9 week of Doom and then you will love Week Two. On May 12-16, we will make the optimistic case for AI!

See more details on all readings and the final Zoom meeting in my previous post.

Join Joy to discuss Artificial Intelligence in May 2025

Podcasts are emerging as one of the key mediums for getting expert timely opinions and news about artificial intelligence. For example, EconTalk (Russ Roberts) has featured some of the most famous voices in AI discourse:

EconTalk: Eliezer Yudkowsky on the Dangers of AI (2023)

EconTalk: Marc Andreessen on Why AI Will Save the World 

EconTalk: Reid Hoffman on Why AI Is Good for Humans

If you would like to engage in a discussion about these topics in May, please sign up for the session I am leading. It is free, but you do need to sign up for the Liberty Fund Portal.

The event consists of two weeks when you can do a discussion board style conversation asynchronously with other interested listeners and readers. Lastly, there is a zoom meeting to bring everyone together on May 21. You don’t have to do all three of the parts.

Further description for those who are interested:

Timeless: Artificial Intelligence: Doom or Bloom?

with Joy Buchanan

Time: May 5-9, 2025 and May 12-16, 2025

How will humans succeed (or survive) in the Age of AI? 

Russ Roberts brought the world’s leading thinkers about artificial intelligence to the EconTalk audience and was early to the trend. He hosted Nick Bostrom on Superintelligence in 2014, more than a decade before the world was shocked into thinking harder about AI after meeting ChatGPT. 

We will discuss the future of humanity by revisiting or discovering some of Robert’s best EconTalk podcasts on this topic and reading complementary texts. Participants can join in for part or all of the series. 

Week 1: May 5-9, 2025

An asynchronous discussion, with an emphasis on possible negative outcomes from AI, such as unemployment, social disengagement, and existential risk. Participants will be invited to suggest special topics for a separate session that will be held on Zoom on May 21, 2025, 2:00-3:30 pm EDT. 

Required Readings: EconTalk: Eliezer Yudkowsky on the Dangers of AI (2023)

EconTalk: Erik Hoel on the Threat to Humanity from AI (2023) with an EconTalk Extra Who’s Afraid of Artificial Intelligence? by Joy Buchanan

“Trurl’s Electronic Bard” (1965) by Stanisław Lem. 

In this prescient short story, a scientist builds a poetry-writing machine. Sound familiar? (If anyone participated in the Life and Fate reading club with Russ and Tyler, there are parallels between Lem’s work and Vasily Grossman’s “Life and Fate” (1959), as both emerged from Eastern European intellectual traditions during the Cold War.)

Optional Readings:Technological Singularity” by Vernor Vinge. Field Robotics Center, Carnegie Mellon U., 1993.

“‘I am Bing, and I Am Evil’: Microsoft’s new AI really does herald a global threat” by Erik Hoel. The Intrinsic Perspective Substack, February 16, 2023.

Situational Awareness” (2024) by Leopold Aschenbrenner 

Week 2: May 12-16, 2025

An asynchronous discussion, emphasizing the promise of AI as the next technological breakthrough that will make us richer.
Required Readings: EconTalk: Marc Andreessen on Why AI Will Save the World 

EconTalk: Reid Hoffman on Why AI Is Good for Humans

Optional Readings: EconTalk: Tyler Cowen on the Risks and Impact of Artificial Intelligence (2023)

ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics” (2024) 

Joy Buchanan with Stephen Hill and Olga Shapoval. The American Economist, 69(1), 80-87.

What the Superintelligence can do for us (Joy Buchanan, 2024)

Dwarkesh Podcast “Tyler Cowen – Hayek, Keynes, & Smith on AI, Animal Spirits, Anarchy, & Growth

Week 3: May 21, 2025, 2:00-3:30 pm EDT (Zoom meeting)
Pre-registration is required, and we ask you to register only if you can be present for the entire session. Readings are available online. We will get to talk in the same zoom room!

Required Readings: Great Antidote podcast with Katherine Mangu-Ward on AI: Reality, Concerns, and Optimism

Additional readings will be added based partially on previous sessions’ participants’ suggestions

Optional Readings: Rediscovering David Hume’s Wisdom in the Age of AI (Joy Buchanan, EconLog, 2024)

Professor tailored AI tutor to physics course. Engagement doubled” The Harvard Gazette. 2024. 

Please email Joy if you have any trouble signing up for the virtual event.