Literature Review is a Difficult Intellectual Task

As I was reading through What is Real?, it occurred to me that I’d like a review on an issue. I thought, “Experimental physics is like experimental economics. You can sometimes predict what groups or “markets” will do. However, it’s hard to predict exactly what an individual human will do.” I would like to know who has written a little article on this topic.

I decided to feed the following prompt into several LLMs: “What economist has written about the following issue: Economics is like physics in the sense that predictions about large groups are easier to make than predictions about the smallest, atomic if you will, components of the whole.”

First, ChatGPT (free version) (I think I’m at “GPT-4o mini (July 18, 2024)”):

I get the sense from my experience that ChatGPT often references Keynes. Based on my research, I think that’s because there are a lot of mentions of Keynes books in the model training data. (See “”ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics“) 

Next, I asked ChatGPT, “What is the best article for me to read to learn more?” It gave me 5 items. Item 2 was “Foundations of Economic Analysis” by Paul Samuelson, which likely would be helpful but it’s from 1947. I’d like something more recent to address the rise of empirical and experimental economics.

Item 5 was: “”Physics Envy in Economics” (various authors): You can search for articles or papers on this topic, which often discuss the parallels between economic modeling and physics.” Interestingly, ChatGPT is telling me to Google my question. That’s not bad advice, but I find it funny given the new competition between LLMs and “classic” search engines.

When I pressed it further for a current article, ChatGPT gave me a link to an NBER paper that was not very relevant. I could have tried harder to refine my prompts, but I was not immediately impressed. It seems like ChatGPT had a heavy bias toward starting with famous books and papers as opposed to finding something for me to read that would answer my specific question.

I gave Claude (paid) a try. Claude recommended, “If you’re interested in exploring this idea further, you might want to look into Hayek’s works, particularly “The Use of Knowledge in Society” (1945) and “The Pretense of Knowledge” (1974), his Nobel Prize lecture.” Again, I might have been able to get a better response if I kept refining my prompt, but Claude also seemed to initially respond by tossing out famous old books.

Continue reading

The Open Internet Is Dead; Long Live The Open Internet

Information on the internet was born free, but now lives everywhere in walled gardens. Blogging sometimes feels like a throwback to an earlier era. So many newer platforms have eclipsed blogs in popularity, almost all of which are harder to search and discover. Facebook was walled off from the beginning, Twitter is becoming more so. Podcasts and video tend to be open in theory, but hard to search as most lack transcripts. Longer-form writing is increasingly hidden behind paywalls on news sites and Substack. People have complained for years that Google search is getting worse; there are many reasons for this, like a complacent company culture and the cat-and-mouse game with SEO companies, but one is this rising tide of content that is harder to search and link.

To me part of the value of blogging is precisely that it remains open in an increasingly closed world. Its influence relative to the rest of the internet has waned since its heydey in ~2009, but most of this is due to how the rest of the internet has grown explosively at the expense of the real world; in absolute terms the influence of blogging remains high, and perhaps rising.

The closing internet of late 2023 will not last forever. Like so much else, AI is transforming it, for better and worse. AI is making it cheap and easy to produce transcripts of podcasts and videos, making them more searchable. Because AI needs large amounts of text to train models, text becomes more valuable. Open blogs become more influential because they become part of the training data for AI; because of what we have written here, AI will think and sound a little bit more like us. I think this is great, but others have the opposite reaction. The New York Times is suing to exclude their data from training AIs, and to delete any models trained with it. Twitter is becoming more closed partly in an attempt to limit scraping by AIs.

So AI leads to human material being easier for search engines to index, and some harder; it also means there will be a flood of AI-produced material, mostly low-quality, clogging up search results. The perpetual challenge of search engines putting relevant, high-quality results first will become much harder, a challenge which AI will of course be set to solve. Search engines already have surprisingly big problems with not indexing writing at all; searching for a post on my old blog with exact quotes and not finding it made me realize Google was missing some posts there, and Bing and DuckDuckGo were missing all of them. While we’re waiting for AI to solve and/or worsen this problem, Gwern has a great page of tips on searching for hard-to-find documents and information, both the kind that is buried deep down in Google and the kind that is not there at all.