Comparing ChatGPT and Bing for a research literature review in April 2023

We wrote “ChatGPT Cites Economics Papers That Do Not Exist

I expect that problem to go away any day, so I gave it another try this week. For the record, they are currently calling it “ChatGPT Mar 23 Version” on the OpenAI website.

First, I asked ChatGPT for help with the following prompt:

ChatGPT is at it again. There is no such paper, as I will verify by showing John Duffy’s publications from that year: 

ChatGPT makes up lies (“hallucinations”). It is also great for some tasks, and smart people are already using it to become more productive. My post last week was on how impressive ChatGPT seemed in the Jonathan Swift impersonation. I didn’t take any time to do fact checking and I would bet money that at least something was made-up-facts in there.

I posed the same question to the Bing plug-in for the Edge browser (Microsoft). Yup, I have opened Edge for the first time in forever to use Bing.

Bing handles the prompt by linking to a useful relevant paper – so if you click the link you will get to a helpful and not misleading answer. Just being a smart search engine instead of hallucinating randomly is better, for my purposes.

The actual paper I wanted returned was this one, by the way:

Duffy, John. “Experimental macroeconomics.” Behavioural and Experimental Economics (2010): 113-119.

There is no reason that ChatGPT should be better than an expert in a subfield of a field of economics. But that’s the genius of a good search engine. You ask it “Can I repair a broken fiddlewhat?” The search engine does not claim to know but rather directs you to the blog of the world expert in fiddlewhats.

I can’t find the link to it, but I’m going to toss in one more thing here. Tyler Cowen did an interview this Spring on AI. There was a newspaper reporter who had a “creepy” interaction with an AI that made for the topic of a viral internet article. Tyler made a very contrarian point by saying that he interprets this as a case of AI alignment. The reporter wanted something sensational and he got what he wanted.

So, it will probably be true for a long time that if you want to find a failure of AI, you can get what you want. Still, I’m putting this on the record here because I wonder if this particular problem will get solved quickly.

GPT showing off on Conversations with Tyler

Since posting it in January of 2023, our post on a failure of ChatGPT has been one of the most popular articles of the year in terms of web traffic. Read ChatGPT Cites Economics Papers That Do Not Exist

LLMs will make up nonsense lies, but it might sound sophisticated and serious. As human editors, we are not used to detecting that kind of problem, because usually if something sounds serious that means it has been written by a serious thoughtful person (usually).

That’s a problem, but ChatGPT is really good at some tasks.

Instead of pointing out the fact that ChatGPT is not perfect, Tyler Cowen wants people to see the potential. He has done an entire podcast interview with ChatGPT, asking it to take on the voice and role of deceased writer Jonathan Swift.

I’ll share part of the transcript:

COWEN: In 1709, you, Jonathan Swift, still were calling yourself a Whig. What exactly induced you to switch to the Tory side of the ledger?

SWIFT: My shift from the Whigs to the Tories was a gradual and complex process, influenced by several factors, including personal connections, political developments, and ideological disagreements.

During my early years in politics, I was indeed more aligned with the Whigs, who championed principles such as constitutional government, religious toleration, and the supremacy of Parliament over the monarchy. However, over time, I became increasingly disillusioned with the Whigs, particularly as their internal factions and power struggles intensified.

The Whigs’ support for the war with France, which began in 1702, was a point of contention for me. I believed that the war was costly…

The replies seem nuanced and insightful. Nothing made me suspect that they were factually inaccurate, although I do not know enough about Irish history to judge.

Is there any human who could have produced this script? I think so, although it would have required a lot of work. If one of these replies is better than anything a human Swift scholar would produce, how would we know?

GPT4 can write good summaries for the work of a prolific author like Swift, because the model can train on lots of examples.

GPT4 could probably write a good biography of a modern figure by pulling together all of the writing by them and about them. Maybe GPT4 could efficiently scrape up all mentions of this figure online and synthesize them faster than a human scholar. However, we observed GPT3 completely making up citations when we tried to get it to do economics summaries.

I’m concerned that humans will use GPT4 to write but not do the requisite fact-checking. That could introduce a new corpus of work that the next LLMs will train on, which might be full of lies. Humans might not admit to using GPT, and therefore we wouldn’t have a mechanism for using extra scrutiny on AI-generated writing from 2023. Humans can make mistakes too… so the ultimate solution could be an all-powerful AI that somehow does begin with a fairly accurate map of the world and goes around fact-checking everything faster than human editors ever could.

Self-Replicating Machines: A Practical Human Response

Currently, we have software that can write software. What about physical machines that can produce physical machines? Indeed, what about machines that can produce other machines without human direction?

First of all, machines-building machines (MBM) still require resources: energy, transportation, time, and other inputs. A well-programmed machine that self-replicates quickly can grow in number exponentially. But where would the machines get the resources that enable self-replication? They’d have to purchase them (or conquer the world sci-fi style). Where would a machine get the resources to make purchases of necessary inputs? The same place that everyone else gets them.

Continue reading

Yes, it was SMET

Last week I posted about the transition from SMET to STEM at the National Science Foundation. I was repeating a story that can be found on several websites including an entry in Britannica.

Andrew Ruapp reached out to me about a possible error in my post. He presented some evidence that the term STEM has been used prior to 2001. Casually Googling the topic did not bring me to a reputable source for the claim I had made last week. “SMET” is comically bad. So, I did start to wonder if it had never been officially used at the NSF and was just a funny story getting repeated online.

To solve this problem, I reached out directly to the person who was credited with making the transition. Dr. Judith Ramaley is currently President Emerita and Distinguished Professor of Public Service at Portland State University.

Having her permission to share, here is our email correspondence:

Encouraged by her reply, I looked online and found a public NSF document from 1998 that clearly uses SMET.

Lastly, I asked her several questions, in a mini email interview:

  1. Are you surprised by how widespread the STEM term has become?

Ramaley: I wasn’t surprised because once NSF adopted the new acronym, I expected it would catch on.

2. Do you feel that the “STEM” brand has been successful?

Ramaley: STEM isn’t really a brand. It is simply an acronym. It works better than SMET I think because engineering and technology are framed by science and mathematics rather than trailing along behind as if less important. I am fascinated by the growing pressure to add other elements to STEM, making it STEAM, for instance. 

3. My son in 2nd grade goes to a STEM activity class once a week. (They just call it “STEM.”) This week he tells me they are working on a pollination project. Would you recommend anything different than the current system for encouraging American students to pursue technology fields?

Ramaley: Your third question is a sweeping one. It would help to know what a STEM activity means each week in your son’s second grade class.  I am drawn to ways of learning STEM that encourage students to approach these issues in an inquiry-based way that lets them explore what it means to ask interesting questions and work out ways to try to answer them. Young people are very curious about how the world works. I doubt that I need to tell you that since I bet your son sometimes drives you nuts with WHY and HOW questions. Questions like that are beautiful questions. 

Online Reading Onpaper

We have six weekly contributors here at EWED and I try to read every single post. I don’t always read them the same day that they are published. Being subscribed is convenient because I can let my count of unread emails accumulate as a reminder of what I’ve yet to read.

Shortly after my fourth child was born over the summer, I understandably got quite behind in my reading. I think that I had as many as twelve unread posts. I would try to catchup on the days that I stayed home with the children. After all, they don’t require constant monitoring and often go do their own thing. Then, without fail, every time that I pull out my phone to catch up on some choice econ content, the kids would get needy. They’d start whining, fighting, or otherwise suddenly start accosting me for one thing or another – even if they were fine just moments before. It’s as if my phone was the signal that I clearly had nothing to do and that I should be interacting with them. Don’t get me wrong, I like interacting with my kids. But, don’t they know that I’m a professional living in the 21st century? Don’t they know that there is a lot of good educational and intellectually stimulating content on my phone and that I am not merely zoning out and wasting my time?

No. They do not.

I began to realize that it didn’t matter what I was doing on my phone, the kids were not happy about it.

I have fond childhood memories of my dad smoking a pipe and reading the newspaper. I remember how he’d cross his legs and I remember how he’d lift me up and down with them. I less well remember my dad playing his Game Boy. That was entertaining for a while, but I remember feeling more socially disconnected from him at those times. Maybe my kids feel the same way. It doesn’t matter to them that I try to read news articles on my phone (the same content as a newspaper). They see me on a 1-player device.

So, one day I printed out about a dozen accumulated EWED blog posts as double-sided and stapled articles on real-life paper.

The kids were copacetic, going about their business. They were fed, watered, changed, and had toys and drawing accoutrement. I sat down with my stack of papers in a prominent rocking chair and started reading. You know what my kids did in response? Not a darn thing! I had found the secret. I couldn’t comment on the posts or share them digitally. But that’s a small price to pay for getting some peaceful reading time. My kids didn’t care that I wasn’t giving them attention. Reading is something they know about. They read or are read to every day. ‘Dad’s reading’ is a totally understandable and sympathetic activity. ‘Dad’s on his phone’ is not a sympathetic activity. After all, they don’t have phones.

They even had a role to play. As I’d finish reading the blog posts, I’d toss the stapled pages across the room. It was their job to throw those away in the garbage can. It became a game where there were these sheets of paper that I cared about, then examined , and then discarded… like yesterday’s news. They’d even argue some over who got to run the next consumed story across the house to the garbage can (sorry fellow bloggers).

If you’re waiting for the other shoe to drop, then I’ve got nothing for you. It turns out that this works for us. My working hypothesis is that kids often don’t want parents to give them attention in particular. Rather, they want to feel a sense of connection by being involved, or sharing experiences. Even if it’s not at the same time. Our kids want to do the things that we do. They love to mimic. My kids are almost never allowed to play games or do nearly anything on our phones. So, me being on my phone in their presence serves to create distance between us. Reading a book or some paper in their presence? That puts us on the same page.

ChatGPT Cites Economics Papers That Do Not Exist

EDIT: See my new published paper on this topic “ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

This blog post is co-authored with graduate student Will Hickman.

EDIT: Will and I now have a paper on trusting ChatGPT “Do People Trust Humans More Than ChatGPT?

Although many academic researchers don’t enjoy writing literature reviews and would like to have an AI system do the heavy lifting for them, we have found a glaring issue with using ChatGPT in this role. ChatGPT will cite papers that don’t exist. This isn’t an isolated phenomenon – we’ve asked ChatGPT different research questions, and it continually provides false and misleading references. To make matters worse, it will often provide correct references to papers that do exist and mix these in with incorrect references and references to nonexistent papers. In short, beware when using ChatGPT for research.

Below, we’ve shown some examples of the issues we’ve seen with ChatGPT. In the first example, we asked ChatGPT to explain the research in experimental economics on how to elicit attitudes towards risk. While the response itself sounds like a decent answer to our question, the references are nonsense. Kahneman, Knetsch, and Thaler (1990) is not about eliciting risk. “Risk Aversion in the Small and in the Large” was written by John Pratt and was published in 1964. “An Experimental Investigation of Competitive Market Behavior” presumably refers to Vernon Smith’s “An Experimental Study of Competitive Market Behavior”, which had nothing to do with eliciting attitudes towards risk and was not written by Charlie Plott. The reference to Busemeyer and Townsend (1993) appears to be relevant.

Although ChatGPT often cites non-existent and/or irrelevant work, it sometimes gets everything correct. For instance, as shown below, when we asked it to summarize the research in behavioral economics, it gave correct citations for Kahneman and Tversky’s “Prospect Theory” and Thaler and Sunstein’s “Nudge.” ChatGPT doesn’t always just make stuff up. The question is, when does it give good answers and when does it give garbage answers?

Strangely, when confronted, ChatGPT will admit that it cites non-existent papers but will not give a clear answer as to why it cites non-existent papers. Also, as shown below, it will admit that it previously cited non-existent papers, promise to cite real papers, and then cite more non-existent papers. 

We show the results from asking ChatGPT to summarize the research in experimental economics on the relationship between asset perishability and the occurrence of price bubbles. Although the answer it gives sounds coherent, a closer inspection reveals that the conclusions ChatGPT reaches do not align with theoretical predictions. More to our point, neither of the “papers” cited actually exist.  

Immediately after getting this nonsensical answer, we told ChatGPT that neither of the papers it cited exist and asked why it didn’t limit itself to discussing papers that exist. As shown below, it apologized, promised to provide a new summary of the research on asset perishability and price bubbles that only used existing papers, then proceeded to cite two more non-existent papers. 

Tyler has called these errors “hallucinations” of ChatGPT. It might be whimsical in a more artistic pursuit, but we find this form of error concerning. Although there will always be room for improving language models, one thing is very clear: researchers be careful. This is something to keep in mind, also, when serving as a referee or grading student work.

New Survey on Bootcamp Graduates

I have been investigating how to get more talent in the tech industry for a while. There is not a lot of data on precisely how people select into tech and what might cause more people to train for in-demand jobs. Gordon Macrae, in his substack The View, has a recent relevant post Issue #9: Tracking 100 bootcamp graduates from 2015.

Gordon ran his own survey of 100 graduates of coding bootcamps. Coding bootcamps are a fascinating element that help fill in the skills gap. They are not well-understood, and we don’t have much publicly available data of the sort that helps researchers measure the outcomes of a traditional college education.

Here are some of his results from this preliminary survey:

Of this total, 68% of the graduates surveyed in 2022 were doing roles where the bootcamp was necessary for them to work in that role. What I found fascinating, though, was that this figure varied wildly depending on the bootcamp they attended. 

On the lowest end, just 50% of graduates from Bootcamp A were doing jobs in 2022 that required having gone to a bootcamp. Conversely, 90% of Bootcamp D graduates were working in technical roles seven years after graduating.

What is more, the percentage of bootcamp graduates in technical roles at 7 years after graduation has gone done by 15%. The average immediately after graduation was 82% working in a technical role. 

Other resources:

There is more work to be done in this area.

AI Can’t Cure a Flaccid Mind

Many of my classes consist of a large writing component. I’ve designed the courses so that most students write the best paper that they’ll ever write in their life. Recently, I had reason to believe that a student was using AI or a paid service to write their paper. I couldn’t find conclusive evidence that they didn’t write it, but it ended up not mattering much in the end.

Continue reading

Reckless Management Led to BlockFi Crypto Bankruptcy

Since my nontrivial deposits at the cryptocurrency lending firm BlockFi have been blocked (maybe forever) from withdrawal, I keep an eye on news from that front. My main source of information has been missives from BlockFi itself, in which management portrays itself as being very careful with customer funds; it was only the shocking, unforeseeable collapse of the FTX exchange that forced the otherwise sober and responsible BlockFi into its recent bankruptcy. I have believed that view of things, since that is all I knew.

However, Emily Mason at Forbes has poked around behind the scenes, including finding insiders willing to talk (off the record) about less-savory doings within BlockFi. The title of her recent article, BlockFi Employees Warned Of Credit Risks, But Say Executives Dismissed Them, pretty much says it all. The article starts out:

In its bankruptcy filing last week, New Jersey-based BlockFi attempted to paint itself as a responsible lender hit by plummeting crypto prices and the collapse of crypto brokerage FTX and its affiliated trading firm, Alameda.

That is the view I have held up till now. However, Mason then goes on to note:

 But a closer look at the company’s history reveals that its vulnerabilities likely began much earlier with missteps in risk management, including loosened lending standards, a highly concentrated pool of borrowers and unsustainable trading activity.

To keep this blog post short, I will just paste in a few excerpts where she fleshes out her case:

While the company regularly touted a sophisticated risk management team, current and former employees indicate in interviews that risk professionals were dismissed by executives preoccupied with delivering growth to investors. As early as 2020, employees were discouraged from describing risks in written internal communications to avoid liability, a former employee states.

Ouch. Not a good sign.

Until August 2021, BlockFi advertised that loans were typically over-collateralized. But large potential borrowers were often unwilling to meet those requirements, a cease and desist order brought by the Securities and Exchange Commission against BlockFi in February states. The availability of uncollateralized capital from competing companies like Voyager created stiff competition in the lending field.

Under pressure to continue growing and delivering yields, BlockFi began lending to these parties with less collateral than publicly stated without informing customers on the amount of risk involved with interest accounts, according to the SEC order which resulted in a $100 million fine for the company. As a result, BlockFi paused access to its interest accounts in the U.S.

Wait, that is MY money they were messing with. Now I am really annoyed.

In addition to lowering its collateral requirements, BlockFi’s due diligence process had flaws, former borrowers say. Available credit for borrowers was decided based on their assets, but BlockFi and other lenders failed to investigate both the size and quality of potential borrowers’ holdings. Like Voyager and other crypto lenders, BlockFi accepted unaudited balance sheets from hedge funds and proprietary trading firms former borrowers say, leaving room for manipulation on the borrower side.

In the due diligence process, lenders like BlockFi and Voyager did not examine whether borrowers’ balance sheet assets were denominated in dollars or less liquid tokens like FTX-issued FTT.

The revelation that Alameda’s balance sheet was mostly FTT tokens was the news that set off the unraveling of both Alameda and FTX and triggered contagion effects across the industry. In early November, Alameda defaulted on $680 million in loan obligations to BlockFi, according to the bankruptcy filing.

Some BlockFi employees reportedly warned of the shakiness of the parties to whom clients’ finds were being loaned. Management dismissed these concerns because the loans were “collateralized”,  but as noted above, the extent of that collateral was *not* what we clients were told:

An internal team at BlockFi also raised concerns that the borrower pool was too concentrated among a pool of crypto whales, including mega hedge funds Three Arrows Capital and Alameda, another former employee states. Management responded that the loans were collateralized, according to the employee.

This is a very common scenario in finance: In search of profits, management  cuts corners and takes more risks with client funds than they were telling the clients. Maybe Sam Bankman-Fried will up with cell-mates from BlockFi.

Because BlockFi survived the Luna/Terra collapse some months ago and because I believed the steady stream of reassuring pronouncements from BlockFi management, I only withdrew a third of my funds back in the summer. But as it turns out, that withdrawal was apparently bankrolled by a big loan to BlockFi from Bankman-Fried’s FTX; but FTX is now caput.  So the odds of my ever seeing the rest of my funds are slim indeed:

In BlockFi’s bankruptcy filing and in public statements made by its CEO, Zac Prince, the company points to its survival through the collapse of the Terra/Luna ecosystem and subsequent shuttering of Three Arrows Capital as evidence of strong management. But that endurance four months ago was made possible through a $400 million credit line from now-defunct FTX, which allowed the firm to meet panicked withdrawal requests from depositors. When FTX folded in early November, BlockFi lost its lending back stop and could no longer meet fresh waves of withdrawal requests.

One lesson learned: If there is a reasonable chance of a panic, it can pay to be the first to panic, not the last.

Slow Adjustment in Tech Labor for CGO Research

The CGO published a policy paper I wrote with Henry Kronk.

The Slow Adjustment in Tech Labor: Why Do High-Paying Tech Jobs Go Unfilled?

Executive Summary

The United States technology industry continues to struggle to recruit new talent. According to the US Bureau of Labor Statistics, the number of people employed in technology is not increasing quickly. 

Tech jobs pay well and don’t have the drawbacks of some other in-demand jobs, such as the travel schedule of a truck driver or the physically taxing labor required in oil fields.

Tech jobs are sometimes touted as a guarantee of having a comfortable and rewarding career, but the reality is not that simple.

Economics suggests that high wages would eliminate labor shortages, but that’s not the case in tech work. Why?

In this paper, authors Joy Buchanan and Henry Kronk propose a set of factors that have been overlooked and apply broadly to the tech sector. 

Individuals with high-status tech jobs report burnout, anxiety, depression, and other mental health issues at higher rates than the general population. They also have to deal with the constant threat of becoming obsolete. Because technology changes so quickly, they must constantly work to update their skills in order to remain competitive.  

The authors offer several recommendations for tech companies, educators, and policymakers:

  • Political and community leaders can provide more accurate messaging such as communicating clearer expectations about the difficulties of entering the tech workforce. 
  • The tech industry could benefit from improvements in computer education. The authors cite a need for more pre-college exposure to computer occupations as well as a need to add communication skills to computer science curriculums.
  • Teachers, parents, and tech companies can all find ways to inform young people at an age-appropriate level about opportunities. Computer science is abstract and hard to understand. Young people who have some exposure to computer science through a class or camp are more likely to become CS majors in college. 
  • Company leaders can improve their recruitment and development strategies to reflect the labor market realities including paying enough to compensate employees for the mental challenges of demanding technical work and alleviating their own talent shortages by investing in training and education. 
  • Tech companies may be able to attract more women and minorities by improving their scheduling and management practices.

Henry and I examined public data and the existing literature to get a better understanding of the current state of knowledge on this issue. I hope our paper can be helpful, however we partly just highlight how many questions still exist about tech and talent.

My recent paper in Labour Economics, Willingness to be Paid: Who Trains for Tech Jobs?, was designed to add new data to address these questions.