Comparing ChatGPT and Bing for a research literature review in April 2023

We wrote “ChatGPT Cites Economics Papers That Do Not Exist

I expect that problem to go away any day, so I gave it another try this week. For the record, they are currently calling it “ChatGPT Mar 23 Version” on the OpenAI website.

First, I asked ChatGPT for help with the following prompt:

ChatGPT is at it again. There is no such paper, as I will verify by showing John Duffy’s publications from that year: 

ChatGPT makes up lies (“hallucinations”). It is also great for some tasks, and smart people are already using it to become more productive. My post last week was on how impressive ChatGPT seemed in the Jonathan Swift impersonation. I didn’t take any time to do fact checking and I would bet money that at least something was made-up-facts in there.

I posed the same question to the Bing plug-in for the Edge browser (Microsoft). Yup, I have opened Edge for the first time in forever to use Bing.

Bing handles the prompt by linking to a useful relevant paper – so if you click the link you will get to a helpful and not misleading answer. Just being a smart search engine instead of hallucinating randomly is better, for my purposes.

The actual paper I wanted returned was this one, by the way:

Duffy, John. “Experimental macroeconomics.” Behavioural and Experimental Economics (2010): 113-119.

There is no reason that ChatGPT should be better than an expert in a subfield of a field of economics. But that’s the genius of a good search engine. You ask it “Can I repair a broken fiddlewhat?” The search engine does not claim to know but rather directs you to the blog of the world expert in fiddlewhats.

I can’t find the link to it, but I’m going to toss in one more thing here. Tyler Cowen did an interview this Spring on AI. There was a newspaper reporter who had a “creepy” interaction with an AI that made for the topic of a viral internet article. Tyler made a very contrarian point by saying that he interprets this as a case of AI alignment. The reporter wanted something sensational and he got what he wanted.

So, it will probably be true for a long time that if you want to find a failure of AI, you can get what you want. Still, I’m putting this on the record here because I wonder if this particular problem will get solved quickly.

GPT showing off on Conversations with Tyler

Since posting it in January of 2023, our post on a failure of ChatGPT has been one of the most popular articles of the year in terms of web traffic. Read ChatGPT Cites Economics Papers That Do Not Exist

LLMs will make up nonsense lies, but it might sound sophisticated and serious. As human editors, we are not used to detecting that kind of problem, because usually if something sounds serious that means it has been written by a serious thoughtful person (usually).

That’s a problem, but ChatGPT is really good at some tasks.

Instead of pointing out the fact that ChatGPT is not perfect, Tyler Cowen wants people to see the potential. He has done an entire podcast interview with ChatGPT, asking it to take on the voice and role of deceased writer Jonathan Swift.

I’ll share part of the transcript:

COWEN: In 1709, you, Jonathan Swift, still were calling yourself a Whig. What exactly induced you to switch to the Tory side of the ledger?

SWIFT: My shift from the Whigs to the Tories was a gradual and complex process, influenced by several factors, including personal connections, political developments, and ideological disagreements.

During my early years in politics, I was indeed more aligned with the Whigs, who championed principles such as constitutional government, religious toleration, and the supremacy of Parliament over the monarchy. However, over time, I became increasingly disillusioned with the Whigs, particularly as their internal factions and power struggles intensified.

The Whigs’ support for the war with France, which began in 1702, was a point of contention for me. I believed that the war was costly…

The replies seem nuanced and insightful. Nothing made me suspect that they were factually inaccurate, although I do not know enough about Irish history to judge.

Is there any human who could have produced this script? I think so, although it would have required a lot of work. If one of these replies is better than anything a human Swift scholar would produce, how would we know?

GPT4 can write good summaries for the work of a prolific author like Swift, because the model can train on lots of examples.

GPT4 could probably write a good biography of a modern figure by pulling together all of the writing by them and about them. Maybe GPT4 could efficiently scrape up all mentions of this figure online and synthesize them faster than a human scholar. However, we observed GPT3 completely making up citations when we tried to get it to do economics summaries.

I’m concerned that humans will use GPT4 to write but not do the requisite fact-checking. That could introduce a new corpus of work that the next LLMs will train on, which might be full of lies. Humans might not admit to using GPT, and therefore we wouldn’t have a mechanism for using extra scrutiny on AI-generated writing from 2023. Humans can make mistakes too… so the ultimate solution could be an all-powerful AI that somehow does begin with a fairly accurate map of the world and goes around fact-checking everything faster than human editors ever could.

Adam Smith in Taylor Swift

See my latest post for Adam Smith Works.

TAYLOR SWIFT’S ANTI-HERO AS A SMITHIAN ANTHEM

The song “Anti-Hero” by Taylor Swift was the number-one song on charts in the United States and globally when it was released in October of 2022. Based on the record-breaking and continued popularity of the song, Swift’s struggles with self-loathing resonates with us. 

 It’s me, hi, I’m the problem, it’s me
 At tea time, everybody agrees

The theme of the song is that Swift feels like a moral monster who is exposed to the watching eyes of society. She imagines proper people gossiping about her flaws at teatime. This reference to British tea culture makes a perfect segue to the moral philosophy of Adam Smith. Those who only think of Smith as an early observer of modern economies might be surprised, but regular readers of AdamSmithWorks won’t be. 

The impartial spectator is a key concept in Smith’s theory…

At the end I even quote the song “Shake it Off.”

Discrepancy in Views about Music Pirating

It’s unusual for the expert opinions on an issue to range all the way from zero to 100%.

Economists using an instrumental variable approach found that digital piracy did not hurt record sales in the 2000’s. Hammond (2014) found, incredibly, that file-sharing increased record sales. The picture above is of an article critiquing the Oberholzer-Gee and Strumpf (2007) conclusion that was published by a top journal.

Liebowitz reports that music industry professionals believed that digital piracy was the primary or complete cause of the decline of record sales. One would think that industry insiders have accurate data on the problem and a decent mental model relating the variables together.

The estimated effect of music file-sharing ranged from helping music sales to completely eliminating them. Where else can we find so much disagreement on the answer to a narrow empirical question?

Tyler vs Matt on SVB Bailouts

Having nothing original to say about the topic du jour, I will highlight two different takes for your consideration.

Tyler in Can the SVB crisis be solved in the longer run?

An unwillingness to guarantee all the deposits would satisfy the desire to penalize businesses and banks for their mistakes, limit moral hazard, and limit the fiscal liabilities of the public sector. Those are common goals in these debates. Nonetheless unintended secondary consequences kick in, and the final results of that policy may not be as intended.

Once depositors are allowed to take losses, both individuals and institutions will adjust their deposit behavior, and they probably would do so relatively quickly. Smaller banks would receive many fewer deposits, and the giant “too big to fail” banks, such as JP Morgan, would receive many more deposits. Many people know that if depositors at an institution such as JP Morgan were allowed to take losses above 250k, the economy would come crashing down. The federal government would in some manner intervene – whether we like it or not – and depositors at the biggest banks would be protected.

In essence, we would end up centralizing much of our American and foreign capital in our “too big to fail” banks. That would make them all the more too big to fail. It also might boost financial sector concentration in undesirable ways.

To see the perversity of the actual result, we started off wanting to punish banks and depositors for their mistakes. We end up in a world where it is much harder to punish banks and depositors for their mistakes.

Matt Y in America needs more giant banks

The problem is, what happens if PNC fails? PNC is the sixth largest bank in the country with over $500 billion in assets. That makes it dramatically smaller than the Big Four banks that are informally labeled “too big to fail” and formally classified as Global Systemically Important Banks (GSIBs).

Tyler wants to see more banks, and not just “Too big to fail” banks. In as many industries as possible, we prefer less concentration. More competition tends to be good for customers and leads to more innovation. Tyler is more comfortable in the messiness that midsize banks cause, or at least he presents that as a necessary evil.

Matt is arguing against more banks, because Silicon Valley Bank wasn’t pre-designated as too big to fail, and yet we are in crisis mode now.

Matt might say that I’m mischaracterizing his argument. Specifically, Matt said that tiny banks are fine because they are small enough for a private company to buy in times to distress. Matt does not explicitly call for fewer banks. However, I think the demise of the mid-size bank would almost certainly result in fewer banks total.

To give a full picture of the arguments being made this week, here’s someone arguing against bailing out SVB.

And here is the EWED SVB material to date:

Jeremy: It’s Never Good News When Deposit Insurance is in the News

Mike: Estimating the effects of a slow news cycle

For real-time updates, follow Jeremy on Twitter:

Be Posting Always

James wrote about our posting philosophy in “Always Be Posting”. The regularity is the point. This strategy is not our original idea, but this specific manifestation of blogging is a kind of experiment that we are running in front of everyone. I’ll add a few comments on this practice.

  1. I blogged more than once a week at first. Although I believe in the benefits of writing, once a week is the right amount for me.
  2. Tyler recently asked Brad Delong about Substack. Delong says, “Substacking is blogging, except that Substacking is blogging where you have explicit permission to send things to people’s email inboxes, and also to have a rather large tip jar.” Tyler mentioned that Substack posts tend to be longer. Delong admits, “I thought blogging was more fun.” Delong thinks longer posts are better because they fight the trend of short posts that I earlier called Poastmodernism. I would say that if you are going to blog regularly for free like we do, it should be fun. That is also what Tyler said when I asked him if young people should blog regularly in “The New Econ Bloggers.”
  3. If you are going to blog, you might wonder when you should start. Society seems obsessed with young geniuses today. I started blogging before tenure but not when I was very young. I should not have started any earlier. Think about the research that shows your brain is still forming until you are about 25. If Leonardo DiCaprio would date you, then be careful about what you say on the internet. What I would hope for teens or undergraduates is that they would have smart safe people to bounce ideas off of. You certainly need to practice writing and questioning. Even though it nearly kills me at the end of every semester, I assign papers in my classes, because I believe that college students should be writing. I was and am lucky to have teachers and friends who I talk to one-on-one when I want to try out ideas. You should be “posting” in the most abstract sense when you are young, but a private paper journal is not a bad place to start.
  4. When the internet first started, I don’t think anyone would have guessed how much content people would create for free. People are posting so much. Despite worries that media pirating would lead to too little content creation, we have more content than ever.

Something fun about regular short posts is that you can put a stake down and then revisit it years later. Here are two of my posts that have turned out well.

  1. In 2022, I went to Disney World for the first time. Ross Douthat criticized Disney World in his book The Decadent Society. I like the book, but I thought that he clearly hadn’t been there. I wrote a whole blog about Disney being the opposite of infrastructure stagnation. Here is Ross now with his New York Times column saying “Wow, I had never actually been there, and the physical infrastructure is amazing.”
  2. In 2021, I wrote “I encourage parents to read fantasy with children. I see a lot of children’s books that promote science or STEM-readiness… Those games that try to trick 5-year-olds into “programming” are less valuable than reading and discussing fantasy stories… What your child will need to be able to do when they are 20 is read and comprehend a textbook that explains a totally new technology that no one alive today understands. Then they will need to think of creative ways to apply that technology to real world problems.” The developments in ChatGPT are making this look pretty good, even earlier than I expected.

We are a posting kind of species.

Complacency and American Girl Dolls 2

It’s time to revisit American Girl Dolls and the Saturn V rocket. The trending topic among millennials is the new “historical” American Girl doll who lives in the year 1999.

Previously, I blogged about the historical Courtney doll from 1986 in “Complacency and American Girl Dolls.” I used Courtney’s accessories to illustrate stagnation in the physical environment (within rich countries) of recent decades. Courtney has a Walkman for playing cassette tapes and she has an arcade-style Pac-Man game to entertain herself. I pointed out that ’80’s Courtney had to be given the World War II doll Molly just to keep life interesting.

What do Isabel and Nicki have a decade later in 1999?  

They have a personal CD player and floppy disks. It’s cute and the toys will sell. However, it does not seem like innovation has introduced many new capabilities. Isabel can listen to music through her headphones and be entertained on screens, just like Courtney could.

Isabel eats Pizza Hut and has dial-up internet access. There is no sense of sacrifice or expanding the frontier. The world was settled, and history had ended.  

What counts for adventure in 1999? Shopping vintage clothing. Just like Courtney, Isabel revisits the past to get a sense of purpose or excitement.

This is Isabel’s diary. Having nothing to do besides look at clothes from past decades, she obsesses over status. Presumably “Kat” complimented her hat in person. Facebook didn’t start until 2004, so Isabel is not worried about “Likes” in social media.

So, what did I do with my kids for their school break on Presidents’ Day?  We went to the U.S. Space and Rocket Center to see the Saturn V rocket.

Continue reading

Yes, it was SMET

Last week I posted about the transition from SMET to STEM at the National Science Foundation. I was repeating a story that can be found on several websites including an entry in Britannica.

Andrew Ruapp reached out to me about a possible error in my post. He presented some evidence that the term STEM has been used prior to 2001. Casually Googling the topic did not bring me to a reputable source for the claim I had made last week. “SMET” is comically bad. So, I did start to wonder if it had never been officially used at the NSF and was just a funny story getting repeated online.

To solve this problem, I reached out directly to the person who was credited with making the transition. Dr. Judith Ramaley is currently President Emerita and Distinguished Professor of Public Service at Portland State University.

Having her permission to share, here is our email correspondence:

Encouraged by her reply, I looked online and found a public NSF document from 1998 that clearly uses SMET.

Lastly, I asked her several questions, in a mini email interview:

  1. Are you surprised by how widespread the STEM term has become?

Ramaley: I wasn’t surprised because once NSF adopted the new acronym, I expected it would catch on.

2. Do you feel that the “STEM” brand has been successful?

Ramaley: STEM isn’t really a brand. It is simply an acronym. It works better than SMET I think because engineering and technology are framed by science and mathematics rather than trailing along behind as if less important. I am fascinated by the growing pressure to add other elements to STEM, making it STEAM, for instance. 

3. My son in 2nd grade goes to a STEM activity class once a week. (They just call it “STEM.”) This week he tells me they are working on a pollination project. Would you recommend anything different than the current system for encouraging American students to pursue technology fields?

Ramaley: Your third question is a sweeping one. It would help to know what a STEM activity means each week in your son’s second grade class.  I am drawn to ways of learning STEM that encourage students to approach these issues in an inquiry-based way that lets them explore what it means to ask interesting questions and work out ways to try to answer them. Young people are very curious about how the world works. I doubt that I need to tell you that since I bet your son sometimes drives you nuts with WHY and HOW questions. Questions like that are beautiful questions. 

SMET

What would you guess SMET is?

Would you like to do SMET?

What if you got caught looking at SMET?

SMET was the first acronym used by the National Science Foundation to stand for “science, technology, engineering, and mathematics”. There was a re-branding of the name that we owe to the American biologist Judith Ramaley. The STEM acronym sounds much better!

Does a cosmetic change matter? Will more students study STEM than SMET? The US government funds initiatives aimed at encouraging students to study STEM fields, so answering this question is important.

Some of these initiatives date back several decades, such as the National Science Foundation’s (NSF) Advanced Technological Education program, which was started in 1992 to provide funding for two-year colleges to develop programs that promote STEM education and prepare students for technical careers. The National Math and Science Initiative (NMSI) was established in 2007 and offers training and support for teachers to improve STEM instruction in K-12 schools. In 2009, the White House launched the “Educate to Innovate” campaign, which aimed to improve STEM education in American schools and increase the number of students pursuing STEM careers. Additionally, several federal agencies, including NASA and the Department of Energy, have launched initiatives over the years to promote STEM education and provide opportunities for students to engage in STEM-related research and projects. These efforts reflect a recognition of the importance of STEM fields to the country’s future economic competitiveness and national security, and a commitment to ensuring that all students have access to the skills and knowledge needed to succeed in these fields.

There is something to be said for branding and marketing in relation to science education. However, I see this as an open question: How much does branding matter, as opposed to the fundamentals of the pay and quality of available jobs that students can get in STEM fields?

I’m preparing a public lecture on my “Willingness to Be Paid” paper. Using an experiment, I examined what factors affect a student’s decision to do a computer programming job. I tried out an encouraging message which turned out to not work in the sense that it did not increase participation. I’m planning to open my talk with the SMET affair as an example of what is being tried with messaging and the tech labor supply.  

Iterations of the Survivorship Bias Meme

If you are Online at all, you have probably seen the survivorship bias plane:

It has inspired new memes. They are funniest when posted without any explanation. Two recent examples are:

and

Sometimes the picture of the plane is used as a whole argument, without any words.

Continue reading