Artificial Intelligence – Page 2 – Economist Writing Every Day

Counting Hallucinations by Web-Enabled LLMs

June 28, 2025June 25, 2025Joy Buchanan3 Comments

In 2023, we gathered the data for what became “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics.” Since then, LLM use has increased. A 2025 survey from Elon University estimates that half of Americans now use LLMs. In the Spring of 2025, we used the same prompts, based on the JEL categories, to obtain a comprehensive set of responses from LLMs about topics in economics.

Our new report on the state of citations is available at SSRN: “LLM Hallucination of Citations in Economics Persists with Web-Enabled Models”

What did we find? Would you expect the models to have improved since 2023? LLMs have gotten better and are passing ever more of what used to be considered difficult tests. (Remember the Turing Test? Anyone?) ChatGPT can pass the bar exam for new lawyers. And yet, if you ask ChatGPT to write a document in the capacity of a lawyer, it will keep making the mistake of hallucinating fake references. Hence, we keep seeing headlines like, “A Utah lawyer was punished for filing a brief with ‘fake precedent’ made up by artificial intelligence”

What we call GPT-4o WS (Web Search) in the figure below was queried in April 2025. This “web-enabled” language model is enhanced with real-time internet access, allowing it to retrieve up-to-date information rather than relying solely on static training data. This means it can answer questions about current events, verify facts, and provide live data—something traditional models, which are limited to their last training cutoff, cannot do. While standard models generate responses based on patterns learned from past data, web-enabled models can supplement that with fresh, sourced content from the web, improving accuracy for time-sensitive or niche topics.

At least one third of the references provided by GPT-4o WS were not real! Performance has not significantly improved to the point where AI can write our papers with properly incorporated attribution of ideas. We also found that the web-enabled model would pull from lower quality sources like Investopedia even when we explicitly stated in the prompt, “include citations from published papers. Provide the citations in a separate list, with author, year in parentheses, and journal for each citation.” Even some of the sources that were not journal articles were cited incorrectly. We provide specific examples in our paper.

In closing, consider this quote from an interview with Jack Clark, co-founder of Anthropic:

The best they had was a 60 percent success rate. If I have my baby, and I give her a robot butler that has a 60 percent accuracy rate at holding things, including the baby, I’m not buying the butler.

Illusions of Illusions of Reasoning

June 21, 2025June 21, 2025Joy Buchanan1 Comment

Even since Scott’s post on Tuesday of this week, a new response has been launched titled “The Illusion of the Illusion of the Illusion of Thinking“

Abstract (emphasis added by me): A recent paper by Shojaee et al. (2025), The Illusion of Thinking, presented evidence of an “accuracy collapse” in Large Reasoning Models (LRMs), suggesting fundamental limitations in their reasoning capabilities when faced with planning puzzles of increasing complexity. A compelling critique by Opus and Lawsen (2025), The Illusion of the Illusion of Thinking, argued these findings are not evidence of reasoning failure but rather artifacts of flawed experimental design, such as token limits and the use of unsolvable problems. This paper provides a tertiary analysis, arguing that while Opus and Lawsen correctly identify critical methodological flaws that invalidate the most severe claims of the original paper, their own counter-evidence and conclusions may oversimplify the nature of model limitations. By shifting the evaluation from sequential execution to algorithmic generation, their work illuminates a different, albeit important, capability. We conclude that the original “collapse” was indeed an illusion created by experimental constraints, but that Shojaee et al.’s underlying observations hint at a more subtle, yet real, challenge for LRMs: a brittleness in sustained, high-fidelity, step-by-step execution. The true illusion is the belief that any single evaluation paradigm can definitively distinguish between reasoning, knowledge retrieval, and pattern execution.

As am writing a new manuscript about hallucination of web-enabled models, this is close to what I am working on. Conjuring up fake academic references might point to a lack of true reasoning ability.

Do Pro and Dantas believe that LLMs can reason? What they are saying, at least, is that evaluating AI reasoning is difficult. In their words, the whole back-and-forth “highlights a key challenge in evaluation: distinguishing true, generalizable reasoning from sophisticated pattern matching of familiar problems…”

The fact that the first sentence of the paper contains the bigram “true reasoning” is interesting in itself. No one dobuts that LLMs are reasoning anymore, at least within their own sandboxes. Hence there have been Champagne jokes going around of this sort:

it's only called reasoning if it's from the brain region of homo sapiens. otherwise, it's just sparkling auto-regression
— Aidan McLaughlin (@aidan_mclau) September 13, 2024

If you’d like to read a response coming from o3 itself, Tyler pointed me to this:

I asked o3 to analyse and critique Apple's new "LLMs can't reason" paper. Despite its inability to reason I think it did a pretty decent job, don't you? pic.twitter.com/jvwqt3NVrt
— rohit (@krishnanrohit) June 9, 2025

Papers about Economists Using LLMs

June 7, 2025June 6, 2025Joy BuchananLeave a comment

The most recent (published in 2025) is this piece about doing data analytics that would have been too difficult or costly before. Link and title: Deep Learning for Economists

Considering how much of frontier economics revolves around getting new data, this could be important. On the other hand, people have been doing computer-aided data mining for a while. So it’s more of a progression than a revolution, in my expectation.

2. Using LLMs to actually generate original data and/or test hypotheses like experimenters: Large language models as economic agents: what can we learn from homo silicus? and Automated Social Science: Language Models as Scientist and Subjects

3. Generative AI for Economic Research: Use Cases and Implications for Economists

Korinek has a new supplemental update as current as December 2024: LLMs Learn to Collaborate and Reason: December 2024 Update to “Generative AI for Economic Research: Use Cases and Implications for Economists,” Published in the Journal of Economic Literature 61 (4)

4. For being comprehensive and early: How to Learn and Teach Economics with Large Language Models, Including GPT

5. For giving people proof of a phenomenon that many people had noticed and wanted to discuss: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Alert: We will soon have an update for current web-enabled models! It would seem that hallucination rates are going down but the problem is not going away.

6. This was published back in 2023. “ChatGPT ranked in the 91st percentile for Microeconomics and the 99th percentile for Macroeconomics when compared to students who take the TUCE exam at the end of their principles course.” (note the “compared to”): ChatGPT has Aced the Test of Understanding in College Economics: Now What?

References

Buchanan, J., Hill, S., & Shapoval, O. (2023). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist, 69(1), 80-87. https://doi.org/10.1177/05694345231218454 (Original work published 2024)

Cowen, Tyler and Tabarrok, Alexander T., How to Learn and Teach Economics with Large Language Models, Including GPT (March 17, 2023). GMU Working Paper in Economics No. 23-18, Available at SSRN: https://ssrn.com/abstract=4391863 or http://dx.doi.org/10.2139/ssrn.4391863

Dell, M. (2025). Deep Learning for Economists. Journal of Economic Literature, 63(1), 5–58. https://doi.org/10.1257/jel.20241733

Geerling, W., Mateer, G. D., Wooten, J., & Damodaran, N. (2023). ChatGPT has Aced the Test of Understanding in College Economics: Now What? The American Economist, 68(2), 233-245. https://doi.org/10.1177/05694345231169654 (Original work published 2023)

Horton, J. J. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv Preprint arXiv:2301.07543.

Korinek, A. (2023). Generative AI for Economic Research: Use Cases and Implications for Economists. Journal of Economic Literature, 61(4), 1281–1317. https://doi.org/10.1257/jel.20231736

Manning, B. S., Zhu, K., & Horton, J. J. (2024). Automated Social Science: Language Models as Scientist and Subjects (Working Paper No. 32381). National Bureau of Economic Research. https://doi.org/10.3386/w32381

We’re All Magical

May 30, 2025Zachary BartschLeave a comment

The widespread availability and easy user interface of artificial intelligence (AI) has put great power at everyone’s fingertips. We can do magical things.

Before the internet existed we would use books to help us better interpret the world. Communication among humans is hard. Expressing logic and even phenomena is complex. This is why social skills matter. Among other things, they help us to communicate. The most obvious example of a communication barrier is language. I remember having a pocket-sized English-Spanish dictionary that I used to help me memorize or query Spanish words. The book helped me communicate with others and to translate ideas from one language to another.

Math books do something similar but the translation is English-Math. We can get broader and say that all textbooks are translation devices. They define field-specific terms and ideas to help a person translate among topic domains, usually with a base-language that reaches a targeted generalizability. We can get extreme and say that all books are translators, communicating the content of one person’s head to another.

But sometimes the field-to-general language translation doesn’t work because readers don’t have an adequate grasp of either language. It isn’t necessarily that readers are generally illiterate. It may be that the level of generality and degree of focus of the translation isn’t right for the reader. Anyone who has ever tried to teach anything with math has encountered this. Students say that the book doesn’t translate clearly, and the communication fails. The book gets the reader’s numeracy or understood definitions wrong. Therefore, there is diversity among readers about how ‘good’ a textbook is.

Search engines are so useful because you can enter some keywords and find your destination, even if you don’t know the proper nouns or domain-specific terms. People used to memorize URLs and that’s becoming less common. Wikipedia is so great because if you want to learn about an idea, they usually explain it in 5 different ways. They tell the story of who created something and who they interacted with. They describe the motivation, the math, the logic, the developments, and usually include examples. Wikipedia translates domain-specific ideas to multiple general languages of different cognitive aptitudes or interests. It scatters links along the way to help users level-up their domain-specific understanding so that they can contextualize and translate the part that they care about.

Historical translation technology was largely for the audience. More recently, translation technology has empowered the transmitters.

Continue reading →

EconTalk Extra on Daisy Christodoulou

May 24, 2025May 17, 2025Joy Buchanan1 Comment

I wrote an Extra for the How Better Feedback Can Revolutionize Education (with Daisy Christodoulou) episode.

Can Students Get Better Feedback? is the title of my Extra.

Read the whole thing at the link (ungated), but here are two quotes:

For now, the question is still what kind of feedback teachers can give that really benefits students. Daisy Christodoulou, the guest on this episode, offers a sobering critique of how educators tend to give feedback in education. One of her points is that much of the written feedback teachers give is vague and doesn’t actually help students improve. She shares an example from Dylan William: a middle school student was told he needed to “make their scientific inquiries more systematic.” When asked what he would do differently next time, the student replied, “I don’t know. If I’d known how to be more systematic, I would have been so the first time.”

Christodoulou also turns to the question many of us are now grappling with: can AI help scale meaningful feedback?

Discuss AI Doom with Joy on May 5

May 3, 2025May 2, 2025Joy BuchananLeave a comment

If you like to read and discuss with smart people, then you can make a free account in the Liberty Fund Portal. If you listen to this podcast over the weekend: Eliezer Yudkowsky on the Dangers of AI (2023) you will be up to speed for our asynchronous virtual debate room on Monday May 5.

Russ Roberts sums up the doomer argument using the following metaphor:

The metaphor is primitive. Zinjanthropus man or some primitive form of pre-Homo sapiens sitting around a campfire and human being shows up and says, ‘Hey, I got a lot of stuff I can teach you.’ ‘Oh, yeah. Come on in,’ and pointing out that it’s probable that we are either destroyed directly by murder or maybe just by out-competing all the previous hominids that came before us, and that in general, you wouldn’t want to invite something smarter than you into the campfire.

What do you think of this metaphor? By incorporating AI agents into society, are we inviting a smarter being to our campfire? Is it likely to eventually kill us out of contempt or neglect? That will be what we are discussing over in the Portal this week.

Is your P(Doom) < 0.05? Great – that means you believe that the probability of AI turning us into paperclips is less than 5%. Come one come all. You can argue against doomers during the May 5-9 week of Doom and then you will love Week Two. On May 12-16, we will make the optimistic case for AI!

See more details on all readings and the final Zoom meeting in my previous post.

Join Joy to discuss Artificial Intelligence in May 2025

April 5, 2025April 22, 2025Joy Buchanan1 Comment

Podcasts are emerging as one of the key mediums for getting expert timely opinions and news about artificial intelligence. For example, EconTalk (Russ Roberts) has featured some of the most famous voices in AI discourse:

EconTalk: Eliezer Yudkowsky on the Dangers of AI (2023)

EconTalk: Marc Andreessen on Why AI Will Save the World

EconTalk: Reid Hoffman on Why AI Is Good for Humans

If you would like to engage in a discussion about these topics in May, please sign up for the session I am leading. It is free, but you do need to sign up for the Liberty Fund Portal.

The event consists of two weeks when you can do a discussion board style conversation asynchronously with other interested listeners and readers. Lastly, there is a zoom meeting to bring everyone together on May 21. You don’t have to do all three of the parts.

Further description for those who are interested:

Timeless: Artificial Intelligence: Doom or Bloom?

with Joy Buchanan

Time: May 5-9, 2025 and May 12-16, 2025

How will humans succeed (or survive) in the Age of AI?

Russ Roberts brought the world’s leading thinkers about artificial intelligence to the EconTalk audience and was early to the trend. He hosted Nick Bostrom on Superintelligence in 2014, more than a decade before the world was shocked into thinking harder about AI after meeting ChatGPT.

We will discuss the future of humanity by revisiting or discovering some of Robert’s best EconTalk podcasts on this topic and reading complementary texts. Participants can join in for part or all of the series.

Week 1: May 5-9, 2025

An asynchronous discussion, with an emphasis on possible negative outcomes from AI, such as unemployment, social disengagement, and existential risk. Participants will be invited to suggest special topics for a separate session that will be held on Zoom on May 21, 2025, 2:00-3:30 pm EDT.

Required Readings: EconTalk: Eliezer Yudkowsky on the Dangers of AI (2023)

EconTalk: Erik Hoel on the Threat to Humanity from AI (2023) with an EconTalk Extra Who’s Afraid of Artificial Intelligence? by Joy Buchanan

“Trurl’s Electronic Bard” (1965) by Stanisław Lem.

In this prescient short story, a scientist builds a poetry-writing machine. Sound familiar? (If anyone participated in the Life and Fate reading club with Russ and Tyler, there are parallels between Lem’s work and Vasily Grossman’s “Life and Fate” (1959), as both emerged from Eastern European intellectual traditions during the Cold War.)

Optional Readings: “Technological Singularity” by Vernor Vinge. Field Robotics Center, Carnegie Mellon U., 1993.

“‘I am Bing, and I Am Evil’: Microsoft’s new AI really does herald a global threat” by Erik Hoel. The Intrinsic Perspective Substack, February 16, 2023.

“Situational Awareness” (2024) by Leopold Aschenbrenner

Week 2: May 12-16, 2025

An asynchronous discussion, emphasizing the promise of AI as the next technological breakthrough that will make us richer.
Required Readings: EconTalk: Marc Andreessen on Why AI Will Save the World

EconTalk: Reid Hoffman on Why AI Is Good for Humans

Optional Readings: EconTalk: Tyler Cowen on the Risks and Impact of Artificial Intelligence (2023)

“ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics” (2024)

Joy Buchanan with Stephen Hill and Olga Shapoval. The American Economist, 69(1), 80-87.

What the Superintelligence can do for us (Joy Buchanan, 2024)

Dwarkesh Podcast “Tyler Cowen – Hayek, Keynes, & Smith on AI, Animal Spirits, Anarchy, & Growth”

Week 3: May 21, 2025, 2:00-3:30 pm EDT (Zoom meeting)
Pre-registration is required, and we ask you to register only if you can be present for the entire session. Readings are available online. We will get to talk in the same zoom room!

Required Readings: Great Antidote podcast with Katherine Mangu-Ward on AI: Reality, Concerns, and Optimism

Additional readings will be added based partially on previous sessions’ participants’ suggestions

Optional Readings: Rediscovering David Hume’s Wisdom in the Age of AI (Joy Buchanan, EconLog, 2024)

“Professor tailored AI tutor to physics course. Engagement doubled” The Harvard Gazette. 2024.

Please email Joy if you have any trouble signing up for the virtual event.

The Big Ideas

January 30, 2025James Bailey2 Comments

Do I really think that the things I write about here and in my papers are the most important things in the world? No. Like most academics, I tend to emphasize the issues where I think I bring a unique perspective, rather than most important issues. But if you don’t realize this, you might get the impression that I think the things I normally talk about are the most important, rather than simply the most neglected and tractable / publishable. I don’t work on the most important issues because I see no good way for me to attack them- but if you do see a way, that is where you should focus. So what are the big issues of the 2020’s?

I see two issues that stand out above the many other important events of the day:

Artificial Intelligence: At minimum, the most important new technology in a generation; has the potential to bring about either utopia or dystopia. Do you have ideas for how to nudge it one way or another?
Rise of China: From extreme poverty to the world’s manufacturing powerhouse in two generations. What lessons should other countries learn from this for their own economic policy? How can we head off a world war and/or Chinese hegemony?

Focusing a bit more on economics, I see two perennial issues where there could be new opportunities to solve vital old questions:

Economic Development: We still don’t have a definitive answer to Adam Smith’s founding question of economics- why are some countries rich while other countries are poor, and how can the poor countries become rich? I think economic freedom is still an underrated answer, but even if you agree, the question remains of how to advance freedom in the face of entrenched interests who benefit from the status quo.
Robust Prediction: How can we make economics into something resembling a real science, one where predictions that include decimal places don’t deserve to be laughed at? Can you find a way to determine how much external validity an experiment has? Or how to use machine learning to get at causality? Or at least push existing empirical research to be more replicable?

I’ve added these points to my ideas page, since all this was inspired by me talking through the ideas on the page with my students and realizing how small and narrow they all seemed. Yes, small and narrow ideas are currently easier to publish in economics, but there is more to research and life than easy publications.

Free Webinar, Jan. 25: Practical and Ethical Aspects of Future Artificial Intelligence

January 21, 2025January 20, 2025Scott BuchananLeave a comment

As most of us know, artificial intelligence (AI) has taken big steps forward in the past few years, with the advent of Large Language Models (LLM) like ChatGPT. With these programs, you can enter a query in plain language, and get a lengthy response in human-like prose. You can have ChatGPT write a computer program or a whole essay for you (which of course makes it challenging for professors to evaluate essays handed in by their students).

However, the lords of Big Tech are not content. Their goal is to create AI with powers that far surpass human intelligence, and that even mimics human empathy. This raises a number of questions:

Is this technically possible? What will be the consequences if some corporations or nations succeed in owning such powerful systems? Will the computers push us bumbling humans out of the way? Will this be a tool for liberation or for oppression? This new technology coming at us may affect us all in unexpected ways.

For those who are interested, there will be a 75-minute webinar on Saturday, January 25 which addresses these issues, and offers a perspective by two women who are leaders in the AI field (see bios below). They will explore the ethical and practical aspects of AI of the future, from within a Christian tradition. The webinar is free, but requires pre-registration:

https://network.asa3.org/events/EventDetails.aspx?id=1889760&group=

Here are bios of the two speakers:

Joanna Ng is a former IBM-er, pivoted to a start-up founder, focusing on Artificial Intelligence, specialized in Augmented Cognition, by integrating with IoT and Blockchain, in the context of web3, by applying design-thinking methodology. With forty-nine patents granted to her name, Joanna was accredited as an IBM Master Inventor. She held a seven-year tenure as the Head of Research, Director of the Center for Advanced Studies, IBM Canada. She has published over twenty peer-reviewed academic publications and co-authored two computer science books with Springer, The Smart Internet, and The Personal Web. She published a Christianity Today article called “How Artificial Intelligence Is Today’s Tower of Babel” and published her first book on faith and discipleship in October 2022, titled Being Christian 2.0.

Rosalind Picard is founder and director of the Affective Computing Research Group at the MIT Media Laboratory; co-founder of Affectiva, which provides Emotion AI; and co-founder and chief scientist of Empatica, which provides the first FDA-cleared smartwatch to detect seizures. Picard is author of over three hundred peer-reviewed articles spanning AI, affective computing, and medicine. She is known internationally for writing the book, Affective Computing, which helped launch the field by that name, and she is a popular speaker, with a TED talk receiving ~1.9 million views. Picard is a fellow of the IEEE and the AAAC, and a member of the National Academy of Engineering. She holds a Bachelors in Electrical Engineering from Georgia Tech and a Masters and Doctorate, each in Electrical Engineering and Computer Science, from MIT. Picard leads a team of researchers developing AI/machine learning and analytics to advance basic science as well as to improve human health and well-being, and has served as MIT’s faculty chair of their MindHandHeart well-being initiative.

Study Shows AI Can Enable Information-Stealing (Phishing) Campaigns

January 14, 2025January 14, 2025Scott BuchananLeave a comment

As a computer user, I make a modest effort to stay informed regarding the latest maneuvers by the bad guys to steal information and money. I am on a mailing list for the Malwarebytes blog, which publishes maybe three or four stories a week in this arena.

Here are three stories from the latest Malwarebytes email:

( 1 ) AI-supported spear phishing fools more than 50% of targets A controlled study reveals that 54% of users were tricked by AI-supported spear phishing emails, compared to just 12% who were targeted by traditional, human-crafted ones. ( 2 ) Dental group lied through teeth about data breach, fined $350,000 Westend Dental denied a 2020 ransomware attack and associated data breach, telling its customers that their data was lost due to an “accidentally formatted hard drive”. The company agreed to pay $350,000 to settle HIPAA violations ( 3 ) “Can you try a game I made?” Fake game sites lead to information stealers Victims lured to a fake game website where they were met with an information stealer instead of the promised game.

The first item here fits with our interest in the promise and perils of AI, so I will paste a couple of self-explanatory excerpts in italics:

One of the first things everyone predicted when artificial intelligence (AI) became more commonplace was that it would assist cybercriminals in making their phishing campaigns more effective.

Now, researchers have conducted a scientific study into the effectiveness of AI supported spear phishing, and the results line up with everyone’s expectations: AI is making it easier to do crimes.

The study, titled Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects, evaluates the capability of large language models (LLMs) to conduct personalized phishing attacks and compares their performance with human experts and AI models from last year.

To this end the researchers developed and tested an AI-powered tool to automate spear phishing campaigns. They used AI agents based on GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages.

With these tools, the researchers achieved a click-through rate (CTR) that marketing departments can only dream of, at 54%. The control group received arbitrary phishing emails and achieved a CTR of 12% (roughly 1 in 8 people clicked the link).

Another group was tested against an email generated by human experts which proved to be just as effective as the fully AI automated emails and got a 54% CTR. But the human experts did this at 30 times the cost of the AI automated tools.

…

…The key to the success of a phishing email is the level of personalization that can be achieved by the AI assisted method and the base for that personalization can be provided by an AI web-browsing agent that crawls publicly available information.

Based on information found online about the target, they are invited to participate in a project that aligns with their interest and presented with a link to a site where they can find more details.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

But there is good news as well. We can use AI to fight AI: … LLMs are also getting better at recognizing phishing emails. Claude 3.5 Sonnet scored well above 90% with only a few false alarms and detected several emails that passed human detection. Although it struggles with some phishing emails that are clearly suspicious to most humans.

In addition, the blog article cited some hard evidence for year-over-year progress in AI capabilities: a year ago, unassisted AI was unable to match the phishing performance of human-generated phishing messages. But now, AI can match and even slightly exceed the effectiveness of human phishing. This is….progress, I guess.

P.S. I’d feel remiss if I did not remind us all yet again, it’s safest to never click on a link embedded in an email message, if you can avoid it. If the email purports to be from a company, it’s safest to go directly to the company’s website and do your business there.