Circular AI Deals Reminiscent of Disastrous Dot.Com Vendor Financing of the 1990s

Hey look, I just found a way to get infinite free electric power:

This sort of extension-cord-plugged-into-itself meme has shown up recently on the web to characterize a spate of circular financing deals in the AI space, largely involving OpenAI (parent of ChatGPT). Here is a graphic from Bloomberg which summarizes some of these activities:

Nvidia, which makes LOTS of money selling near-monopoly, in-demand GPU chips, has made investing commitments in customers or customers of their customers. Notably, Nvidia will invest up to $100 billion in Open AI, in order to help OpenAI increase their compute power. OpenAI in turn inked a $300 billion deal with Oracle, for building more data centers filled with Nvidia chips.  Such deals will certainly boost the sales of their chips (and make Nvidia even more money), but they also raise a number of concerns.

First, they make it seem like there is more demand for AI than there actually is. Short seller Jim Chanos recently asked, “[Don’t] you think it’s a bit odd that when the narrative is ‘demand for compute is infinite’, the sellers keep subsidizing the buyers?” To some extent, all this churn is just Nvidia recycling its own money, as opposed to new value being created.

Second, analysts point to the destabilizing effect of these sorts of “vendor financing” arrangements. Towards the end of the great dot.com boom in the late 1990’s, hardware vendors like Cisco were making gobs of money selling server capacity to internet service providers (ISPs). In order to help the ISPs build out even faster (and purchase even more Cisco hardware), Cisco loaned money to the ISPs. But when that boom busted, and the huge overbuild in internet capacity became (to everyone’s horror) apparent, the ISPs could not pay back those loans. QQQ lost 70% of its value. Twenty-five years later, Cisco stock price has never recovered its 2000 high.

Beside taking in cash investments, OpenAI is borrowing heavily to buy its compute capacity. Since OpenAI makes no money now (and in fact loses billions a year), and (like other AI ventures) will likely not make any money for several more years, and it is locked in competition with other deep-pocketed AI ventures, there is the possibility that it could pull down the whole house of cards, as happened in 2000.  Bernstein analyst Stacy Rasgon recently wrote, “[OpenAI CEO Sam Altman] has the power to crash the global economy for a decade or take us all to the promised land, and right now we don’t know which is in the cards.”

For the moment, nothing seems set to stop the tidal wave of spending on AI capabilities. Big tech is flush with cash, and is plowing it into data centers and program development. Everyone is starry-eyed with the enormous potential of AI to change, well, EVERYTHING (shades of 1999).

The financial incentives are gigantic. Big tech got big by establishing quasi-monopolies on services that consumers and businesses consider must-haves. (It is the quasi-monopoly aspect that enables the high profit margins).  And it is essential to establish dominance early on. Anyone can develop a word processor or spreadsheet that does what Word or Excel do, or a search engine that does what Google does, but Microsoft and Google got there first, and preferences are sticky. So, the big guys are spending wildly, as they salivate at the prospect of having the One AI to Rule Them All.

Even apart from achieving some new monopoly, the trillions of dollars spent on data center buildout are hoped to pay out one way or the other: “The data-center boom would become the foundation of the next tech cycle, letting Amazon, Microsoft, Google, and others rent out intelligence the way they rent cloud storage now. AI agents and custom models could form the basis of steady, high-margin subscription products.”

However, if in 2-3 years it turns out that actual monetization of AI continues to be elusive, as seems quite possible, there could be a Wile E. Coyote moment in the markets:

My Perfunctory Intern

A couple years ago, my Co-blogger Mike described his productive, but novice intern. The helper could summarize expert opinion, but they had no real understanding of their own. To boot, they were fast and tireless. Of course, he was talking about ChatGPT. Joy has also written in multiple places about the errors made by ChatGPT, including fake citations.

I use ChatGPT Pro, which has Web access and my experience is that it is not so tireless. Much like Mike, I have used ChatGPT to help me write Python code. I know the basics of python, and how to read a lot of of it. However, the multitude of methods and possible arguments are not nestled firmly in my skull. I’m much faster at reading, rather than writing Python code. Therefore, ChatGPT has been amazing… Mostly.

I have found that ChatGPT is more like an intern than many suppose:

Continue reading

DeepSeek vs. ChatGPT: Has China Suddenly Caught or Surpassed the U.S. in AI?

The biggest single-day decline in stock market history occurred yesterday, as Nvidia plunged 17% to shave $589 billion off the AI chipmaker’s market cap. The cause of the panic was the surprisingly good performance of DeepSeek, a new Chinese AI application similar to ChatGPT.

Those who have tested DeepSeek find it to perform about as well as the best American AI models, with lower consumption of computer resources. It is also available much cheaper. What really stunned the tech world is that the developers claimed to have trained the model for only about six million dollars, which is way, way less than the billions that a large U.S. firm like OpenAI, Google, or Meta would spend on a leading AI model. All this despite the attempts by the U.S. to deny China the most advanced Nvidia chips. The developers of DeepSeek claim they worked with a modest number of chips, models with deliberately curtailed capacities which met U.S. export allowances.

One conclusion, drawn by the Nvidia bears, is that this shows you *don’t* need ever more of the most powerful and expensive chips to get good development done. The U.S. AI development model has been to build more, huge, power-hungry data centers and fill them up with the latest Nvidia chips. That has allowed Nvidia to charge huge profit premiums, as Google and other big tech companies slurp up all the chips that Nvidia can produce. If that supply/demand paradigm breaks, Nvidia’s profits could easily drop in half, e.g., from 60+% gross margins to a more normal (but still great) 30% margin.

The Nvidia bulls, on the other hand, claim that more efficient models will lead to even more usage of AI, and thus increase the demand for computing hardware – – a cyber instance of Jevons’ Paradox (where the increase in the efficiency of steam engines in burning coal led to more, not less, coal consumption, because it made steam engines more ubiquitous).

I read a bunch of articles to try to sort out hype from fact here. Folks who have tested DeepSeek find it to be as good as ChatGPT, and occasionally better. It can explain its reasoning explicitly, which can be helpful. It is open source, which I think means the code or at least the “weights” have been published. It does seem to be unusually efficient. Westerners have downloaded it onto (powerful) PCs and have run it there successfully, if a bit slowly. This means you can embed it in your own specialized code, or do your AI apart from the prying eyes of ChatGPT or other U.S. AI providers. In contrast, ChatGPT I think can only be run on a powerful remote server.

Unsurprisingly, in the past two weeks DeepSeek has been the most-uploaded free app, surpassing ChatGPT.

It turns out that being starved of computing power led the Chinese team to think their way to several important innovations that make much better use of computing. See here and here for gentle technical discussions of how they did that. Some of it involved hardware-ish things like improved memory management. Another key factor is they figured out a way to only do training on data which is relevant to the training query, instead of training each time on the entire universe of text.

A number of experts scoff at the claimed six million dollar figure for training, noting that if you include all the costs that were surely involved in the development cycle, it can’t be less than hundreds of millions of dollars. That said, it was still appreciably cheaper than the usual American way. Furthermore, it seems quite likely that making use of answers generated by ChatGPT helped DeepSeek to rapidly emulate ChatGPT’s performance. It is one thing to catch up to ChatGPT; it may be tougher to surpass it. Also, presumably the compute-efficient tricks devised by the DeepSeek team will now be applied in the West, as well. And there is speculation that DeepSeek actually has use of thousands of the advanced Nvidia chips, but they hide that fact since it involved end-running U.S. export restrictions. If so, then their accomplishment would be less amazing.

What happens now? I wish I knew. (I sold some Nvidia stock today, only to buy it back when it started to recover in after-hours trading). DeepSeek has Chinese censorship built into it. If you use DeepSeek, your information gets stored on servers in China, the better to serve the purposes of the government there.

Ironically, before this DeepSeek story broke, I was planning to write a post here this week pondering the business case for AI. For all the breathless hype about how AI will transform everything, it seems little money has been made except for Nvidia. Nvidia has been selling picks and shovels to the gold miners, but the gold miners themselves seem to have little to show for the billions and billions of dollars they are pouring into AI. A problem may be that there is not much of a moat here – – if lots of different tech groups can readily cobble together decent AI models, who will pay money to use them? Already, it is being given away for free in many cases. We shall see…

Free Webinar, Jan. 25: Practical and Ethical Aspects of Future Artificial Intelligence

As most of us know, artificial intelligence (AI) has taken big steps forward in the past few years, with the advent of Large Language Models (LLM) like ChatGPT. With these programs, you can enter a query in plain language, and get a lengthy response in human-like prose. You can have ChatGPT write a computer program or a whole essay for you (which of course makes it challenging for professors to evaluate essays handed in by their students).

However, the lords of Big Tech are not content. Their goal is to create AI with powers that far surpass human intelligence, and that even mimics human empathy. This raises a number of questions:

Is this technically possible? What will be the consequences if some corporations or nations succeed in owning such powerful systems? Will the computers push us bumbling humans out of the way? Will this be a tool for liberation or for oppression? This new technology coming at us may affect us all in unexpected ways. 

For those who are interested, there will be a 75-minute webinar on Saturday, January 25 which addresses these issues, and offers a perspective by two women who are leaders in the AI field (see bios below). They will explore the ethical and practical aspects of AI of the future, from within a Christian tradition. The webinar is free, but requires pre-registration:

Here are bios of the two speakers:

Joanna Ng is a former IBM-er, pivoted to a start-up founder, focusing on Artificial Intelligence, specialized in Augmented Cognition, by integrating with IoT and Blockchain, in the context of web3, by applying design-thinking methodology. With forty-nine patents granted to her name, Joanna was accredited as an IBM Master Inventor. She held a seven-year tenure as the Head of Research, Director of the Center for Advanced Studies, IBM Canada. She has published over twenty peer-reviewed academic publications and co-authored two computer science books with Springer, The Smart Internet, and The Personal Web. She published a Christianity Today article called “How Artificial Intelligence Is Today’s Tower of Babel” and published her first book on faith and discipleship in October 2022, titled Being Christian 2.0.

Rosalind Picard is founder and director of the Affective Computing Research Group at the MIT Media Laboratory; co-founder of Affectiva, which provides Emotion AI; and co-founder and chief scientist of Empatica, which provides the first FDA-cleared smartwatch to detect seizures. Picard is author of over three hundred peer-reviewed articles spanning AI, affective computing, and medicine. She is known internationally for writing the book, Affective Computing, which helped launch the field by that name, and she is a popular speaker, with a TED talk receiving ~1.9 million views. Picard is a fellow of the IEEE and the AAAC, and a member of the National Academy of Engineering. She holds a Bachelors in Electrical Engineering from Georgia Tech and a Masters and Doctorate, each in Electrical Engineering and Computer Science, from MIT. Picard leads a team of researchers developing AI/machine learning and analytics to advance basic science as well as to improve human health and well-being, and has served as MIT’s faculty chair of their MindHandHeart well-being initiative.

Writing with ChatGPT Buchanan Seminar on YouTube

I was pleased to be a (virtual) guest speaker for Plateau State University in Nigeria. My host was (Emergent Ventures winner) Nnaemeka Emmanuel Nnadi. The talk is up on Youtube with the following timestamp breakdown:

During the first ten minutes of the video, Ashen Ruth Musa gives an overview called “The Bace People: Location, Culture, Tourist Attraction.”

Then I introduce LLMs and my topic.

Minute 19:00 – 29:00 is a presentation of the paper “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics

Minute 23:30 – 34 is summary of my paper “Do People Trust Humans More Than ChatGPT?

Continue reading

Many Impressive AI Demos Were Fakes

I recently ran across an article on the Seeking Alpha investing site with the provocative title “ AI: Fakes, False Promises And Frauds “, published by LRT Capital Management. Obviously, they think the new generative AI is being oversold. They cite a number of examples where demos of artificial general intelligence were apparently staged or faked.  I followed up on a few of these examples, and it does seem like this article is accurate. I will quote some excerpts here to give the flavor of their remarks.

In 2023, Google found itself facing significant pressure to develop an impressive innovation in the AI race. In response, they released Google Gemini, their answer to OpenAI’s ChatGPT. The unveiling of Gemini in December 2023 was met with a video showcasing its capabilities, particularly impressive in its ability to handle interactions across multiple modalities. This included listening to people talk, responding to queries, and analyzing and describing images, demonstrating what is known as multimodal AI. This breakthrough was widely celebrated. However, it has since been revealed that the video was, in fact, staged and that it does not represent the real capabilities of Google’s Gemini.

… OpenAI, the company behind the groundbreaking ChatGPT, has a history marked by dubious demos and overhyped promises. Its latest release, Chat GPT-4-o, boasted claims that it could score in the 90th percentile on the Unified Bar Exam. However, when researchers delved into this assertion, they discovered that ChatGPT did not perform as well as advertised.[10] In fact, OpenAI had manipulated the study, and when the results were independently replicated, ChatGPT scored on the 15th percentile of the Unified Bar Exam.

… Amazon has also joined the fray. Some of you might recall Amazon Go, its AI-powered shopping initiative that promised to let you grab items from a store and simply walk out, with cameras, machine learning algorithms, and AI capable of detecting what items you placed in your bag and then charging your Amazon account. Unfortunately, we recently learned that Amazon Go was also a fraud. The so-called AI turned out to be nothing more than thousands of workers in India working remotely, observing what users were doing because the computer AI models were failing.

… Facebook introduced an assistant, M, which was touted as AI-powered. It was later discovered that 70% of the requests were actually fulfilled by remote human workers. The cost of maintaining this program was so high that the company had to discontinue its assistant.

… If the question asked doesn’t conform to a previously known example ChatGPT will still produce and confidently explain its answer – even a wrong one.

For instance, the answer to “how many rocks should I eat” was:

…Proponents of AI and large language models contend that while some of these demos may be fake, the overall quality of AI systems is continually improving. Unfortunately, I must share some disheartening news: the performance of large language models seems to be reaching a plateau. This is in stark contrast to the significant advancements made by OpenAI’s ChatGPT, between its second iteration (GPT-2), and the newer GPT-3 – that was a meaningful improvement. Today, larger, more complex, and more expensive models are being developed, yet the improvements they offer are minimal. Moreover, we are facing a significant challenge: the amount of data available for training these models is diminishing. The most advanced models are already being trained on all available internet data, necessitating an insatiable demand for even more data. There has been a proposal to generate synthetic data with AI models and use this data for training more robust models indefinitely. However, a recent study in Nature has revealed that such models trained on synthetic data often produce inaccurate and nonsensical responses, a phenomenon known as “Model Collapse.”

OK, enough of that. These authors have an interesting point of view, and the truth probably lies somewhere between their extreme skepticism and the breathless hype we have been hearing for the last two years. I would guess that the most practical near-term uses of AI may involve some more specific, behind the scenes data-mining for a business application, rather than exactly imitating the way a human would think.

Will the Huge Corporate Spending on AI Pay Off?

Last Tuesday I posted on the topic, “Tech Stocks Sag as Analysists Question How Much Money Firms Will Actually Make from AI”. Here I try to dig a little deeper into the question of whether there will be a reasonable return on the billions of dollars that tech firms are investing into this area.

Cloud providers like Microsoft, Amazon, and Google are building buying expensive GPU chips (mainly from Nvidia) and installing them in power-hungry data centers. This hardware is being cranked to train large language models on a world’s-worth of existing information. Will it pay off?

Obviously, we can dream up all sorts of applications for these large language models (LLMs), but the question is much potential downstream customers are willing to pay for these capabilities. I don’t have the capability for an expert appraisal, so I will just post some excerpts here.

Up until two months ago, it seemed there was little concern about the returns on this investment.  The only worry seemed to be not investing enough. This attitude was exemplified by Sundar Pichai of Alphabet (Google). During the Q2 earnings call, he was asked what the return on Gen AI investment capex would be. Instead of answering the question directly, he said:

I think the one way I think about it is when we go through a curve like this, the risk of under-investing is dramatically greater than the risk of over-investing for us here, even in scenarios where if it turns out that we are over investing. [my emphasis]

Part of the dynamic here is FOMO among the tech titans, as they compete for the internet search business:

The entire Gen AI capex boom started when Microsoft invested in OpenAI in late 2022 to directly challenge Google Search.

Naturally, Alphabet was forced to develop its own Gen AI LLM product to defend its core business – Search. Meta joined in the Gen AI capex race, together with Amazon, in fear of not being left out – which led to a massive Gen AI capex boom.

Nvidia has reportedly estimated that for every dollar spent on their GPU chips, “the big cloud service providers could generate $5 in GPU instant hosting over a span of four years. And API providers could generate seven bucks over that same timeframe.” Sounds like a great cornucopia for the big tech companies who are pouring tens of billions of dollars into this. What could possibly go wrong?

In late June, Goldman Sachs published a report titled, GEN AI: TOO MUCH SPEND,TOO LITTLE BENEFIT?.  This report included contributions from bulls and from bears. The leading Goldman skeptic is Jim Covello. He argues,

To earn an adequate return on the ~$1tn estimated cost of developing and running AI technology, it must be able to solve complex problems, which, he says, it isn’t built to do. He points out that truly life-changing inventions like the internet enabled low-cost solutions to disrupt high-cost solutions even in its infancy, unlike costly AI tech today. And he’s skeptical that AI’s costs will ever decline enough to make automating a large share of tasks affordable given the high starting point as well as the complexity of building critical inputs—like GPU chips—which may prevent competition. He’s also doubtful that AI will boost the valuation of companies that use the tech, as any efficiency gains would likely be competed away, and the path to actually boosting revenues is unclear.

MIT’s Daron Acemoglu is likewise skeptical:  He estimates that only a quarter of AI-exposed tasks will be cost-effective to automate within the next 10 years, implying that AI will impact less than 5% of all tasks. And he doesn’t take much comfort from history that shows technologies improving and becoming less costly over time, arguing that AI model advances likely won’t occur nearly as quickly—or be nearly as impressive—as many believe. He also questions whether AI adoption will create new tasks and products, saying these impacts are “not a law of nature.” So, he forecasts AI will increase US productivity by only 0.5% and GDP growth by only 0.9% cumulatively over the next decade.

Goldman economist Joseph Briggs is more optimistic:  He estimates that gen AI will ultimately automate 25% of all work tasks and raise US productivity by 9% and GDP growth by 6.1% cumulatively over the next decade. While Briggs acknowledges that automating many AI-exposed tasks isn’t cost-effective today, he argues that the large potential for cost savings and likelihood that costs will decline over the long run—as is often, if not always, the case with new technologies—should eventually lead to more AI automation. And, unlike Acemoglu, Briggs incorporates both the potential for labor reallocation and new task creation into his productivity estimates, consistent with the strong and long historical record of technological innovation driving new opportunities.

The Goldman report also cautioned that the U.S. and European power grids may not be prepared for the major extra power needed to run the new data centers.

Perhaps the earliest major cautionary voice was that of Sequoia’s David Cahn. Sequoia is a major venture capital firm. In September, 2023 Cahn offered a simple calculation estimating that for each dollar spent on (Nvidia) GPUs, and another dollar (mainly electricity) would need be spent by the cloud vendor in running the data center. To make this economical, the cloud vendor would need to pull in a total of about $4.00 in revenue. If vendors are installing roughly $50 billion in GPUs this year, then they need to pull in some $200 billion in revenues. But the projected AI revenues from Microsoft, Amazon, Google, etc., etc. were less than half that amount, leaving (as of Sept 2023) a $125 billion dollar shortfall.

As he put it, “During historical technology cycles, overbuilding of infrastructure has often incinerated capital, while at the same time unleashing future innovation by bringing down the marginal cost of new product development. We expect this pattern will repeat itself in AI.” This can be good for some of the end users, but not so good for the big tech firms rushing to spend here.

In his June, 2024 update, Cahn notes that now Nvidia yearly sales look to be more like $150 billion, which in turn requires the cloud vendors to pull in some  $600 billion in added revenues to make this spending worthwhile. Thus, the $125 billion shortfall is now more like a $500 billion (half a trillion!) shortfall. He notes further that the rapid improvement in chip power means that the value of those expensive chips being installed in 2024 will be a lot lower in 2025.

And here is a random cynical comment on a Seeking Alpha article: It was the perfect combination of years of Hollywood science fiction setting the table with regard to artificial intelligence and investors looking for something to replace the bitcoin and metaverse hype. So when ChatGPT put out answers that sounded human, people let their imaginations run wild. The fact that it consumes an incredible amount of processing power, that there is no actual artificial intelligence there, it cannot distinguish between truth and misinformation, and also no ROI other than the initial insane burst of chip sales – well, here we are and R2-D2 and C3PO are not reporting to work as promised.

All this makes a case that the huge spends by Microsoft, Amazon, Google, and the like may not pay off as hoped. Their share prices have steadily levitated since January 2023 due to the AI hype, and indeed have been almost entirely responsible for the rise in the overall S&P 500 index, but their prices have all cratered in the past month. Whether or not these tech titans make money here, it seems likely that Nvidia (selling picks and shovels to the gold miners) will continue to mint money. Also, some of the final end users of Gen AI will surely find lucrative applications. I wish I knew how to pick the winners from the losers here.

For instance, the software service company ServiceNow is finding value in Gen AI. According to Morgan Stanley analyst Keith Weiss, “Gen AI momentum is real and continues to build. Management noted that net-new ACV for the Pro Plus edition (the SKU that incorporates ServiceNow’s Gen AI capabilities) doubled [quarter-over-quarter] with Pro Plus delivering 11 deals over $1M including two deals over $5M. Furthermore, Pro Plus realized a 30% price uplift and average deal sizes are up over 3x versus comparable deals during the Pro adoption cycle.”

Notes on ChatGPT from Sama with Lex

This is a transcript of Lex Fridman Podcast #419 with Sam Altman 2. Sam Altman is (once again) the CEO of OpenAI and a leading figure in artificial intelligence. Two parts of the conversation stood out to me, and I don’t mean the gossip or the AGI predictions. The links in the transcript will take you to a YouTube video of the interview.

(00:53:22) You mentioned this collaboration. I’m not sure where the magic is, if it’s in here or if it’s in there or if it’s somewhere in between. I’m not sure. But one of the things that concerns me for knowledge task when I start with GPT is I’ll usually have to do fact checking after, like check that it didn’t come up with fake stuff. How do you figure that out that GPT can come up with fake stuff that sounds really convincing? So how do you ground it in truth?

Sam Altman(00:53:55) That’s obviously an area of intense interest for us. I think it’s going to get a lot better with upcoming versions, but we’ll have to continue to work on it and we’re not going to have it all solved this year.

Lex Fridman(00:54:07) Well the scary thing is, as it gets better, you’ll start not doing the fact checking more and more, right?

Sam Altman(00:54:15) I’m of two minds about that. I think people are much more sophisticated users of technology than we often give them credit for.

Lex Fridman(00:54:15) Sure.

Sam Altman(00:54:21) And people seem to really understand that GPT, any of these models hallucinate some of the time. And if it’s mission-critical, you got to check it.

Lex Fridman(00:54:27) Except journalists don’t seem to understand that. I’ve seen journalists half-assedly just using GPT-4. It’s-

Sam Altman(00:54:34) Of the long list of things I’d like to dunk on journalists for, this is not my top criticism of them.

As EWED readers know, I have a paper about ChatGPT hallucinations and a paper about ChatGPT fact-checking. Lex is concerned that fact-checking will stop if the quality of ChatGPT goes up, even though no one really expects the hallucination rate to go to zero. Sam takes the optimistic view that humans will use the tool well. I suppose that Altman generally holds the view that his creation is going to be used for good, on net. Or maybe he is just being a salesman who does not want to publicly dwell on the negative aspects of ChatGPT.

I also have written about the tech pipeline and what makes people shy away from computer programming.

Lex Fridman(01:29:53) That’s a weird feeling. Even with a programming, when you’re programming and you say something, or just the completion that GPT might do, it’s just such a good feeling when it got you, what you’re thinking about. And I look forward to getting you even better. On the programming front, looking out into the future, how much programming do you think humans will be doing 5, 10 years from now?

Sam Altman(01:30:19) I mean, a lot, but I think it’ll be in a very different shape. Maybe some people will program entirely in natural language.

Someday, the skills of a computer programmer might morph to be closer to the skills of a manager of humans, since LLMs were trained on human writing.

In my 2023 talk, I suggested that programming will get more fun because LLMs will do the tedious parts. I also suggest that parents should teach their kids to read instead of “code.”

The tedious coding tasks previously done by humans did “create jobs.” I am not worried about mass unemployment yet. We have so many problems to solve (see my growing to-do list for intelligence). There are big transitions coming up. Sama says GPT-5 will be a major step up. He claimed that one reason OpenAI keeps releasing intermediate models is to give humanity a heads up on what is coming down the line.

Does GPT-4 Know How High the Alps Are?

I’m getting ready to give some public local talks about AI. Last week I shared some pictures that I think might help people understand ChatGPT, specifically:

My first thought is that GPT-4 was giving incorrect estimates of the heights of these mountains because it does not actually “know” the correct elevations. But then a nagging question came to mind.

GPT has a “creativity parameter.” Sometimes, it intentionally does not select the top-rated next word in a sentence, for example, in order to avoid being stiff and boring. Could GPT-4 know the exact elevation of these mountains, and it is just intentionally being “creative,” in this case?

I do not want to stand up in front of the local Rotary Club and say something wrong. So, I went to a true expert, Lenny Bogdonoff, to ask for help. Here is his reply:

Not quite. It’s not that it knows or doesn’t know, but based on the prompt, it’s likely unable to parse the specific details and is outputting results respectively. There is a component of stochastic behavior based on what part of the model weights are activated.

One common practice to help avoid this and see what the model does grasp, is to ask it to think step by step, and explain its reasoning. When doing this, you can see the fault in logic.

All that being said, the vision model is actually faulty in being able to grasp the relative position of information, so this kind of task will be more likely to hallucinate.

There are better vision models, that aren’t OpenAI based. For example Qwen-VL-Max is very good, from the Chinese company Alibaba. Another is LLaVA which uses different baselines of open source language models to add vision capabilities

Depending on what you are needing vision for, models can be spiky in capability. Good at OCR but bad at relative positioning. Good at classifying a specific UI element, but bad at detecting plants, etc etc. 

Joy: So, I think I can tell the Rotary Club that GPT was “wrong” as opposed to “intentionally creative.” I think, as I originally concluded, you should not make ChatGPT the pilot of your airplane and go to sleep when approaching the Alps. ChatGPT should be used for what it is good at, such as writing the rough draft of a cover letter. (We have great “autopilot” software for flying planes, already, without involving large language models.)

Another expert, Gavin Leech, also weighed in with some helpful background information:

  • the creativity parameter is known as temperature. But you can actually radically change the output (intelligence, style, creativity) by using more complicated sampling schemes. The best analogy for changing the sampling scheme is that you’re giving it a psychiatric drug. Changing the prompt, conversely, is like CBT or one of those cute mindset interventions.
  • For each real-name model (e.g. “gpt-4-0613”), there’s 3 versions: the base model (which now no one except highly vetted researchers have access to), the instruction-tuned model, and the RLHF (or rather RLAIF) model. The base model is wildly creative, unhinged, but the RLHF one (which the linked researchers use) is heavily electroshocked into not intentionally making things up (as Lenny says).
  • It’s currently not usually possible to diagnose an error – the proverbial black box. My friends are working on this though
  • For more, note OpenAI admitting the “laziness” of their own models. the Turbo model line is intended to fix this.

Thank you, Lenny and Gavin, for donating your insights.

How ChatGPT works from geography and Stephen Wolfram

By now, everyone should consider using ChatGPT and be familiar with how it works. I’m going to highlight resources for that.

My paper about how ChatGPT generates academic citations should be useful to academics as a way to quickly grasp the strengths and weakness of ChatGPT. ChatGPT often works well, but sometimes fails. It’s important to anticipate how it fails. Our paper is so short and simple that your undergraduates could read it before using ChatGPT for their writing assignments.

A paper that does this in a different domain is “GPT4GEO: How a Language Model Sees the World’s Geography” (Again, consider showing it to your undergrads because of the neat pictures, but probably walk through it together in class instead of assigning it as reading.) They describe their project: “To characterise what GPT-4 knows about the world, we devise a set of progressively more challenging experiments… “

For example, they asked ChatGPT about the populations of countries and found that: “For populations, GPT-4 performs relatively well with a mean relative error (MRE) of 3.61%. However, significantly higher errors [occur] … for less populated countries.”

ChatGPT will often say SOMETHING, if prompted correctly. It is often, at least slightly, wrong. This graph shows that most estimates of national populations were not correct and the performance was worse on countries that are less well-known. That’s exactly what we found in our paper on citations. We found that very famous books are often cited correctly, because ChatGPT is mimicking other documents that correctly cite those books. However, if there are not many documents to train on, then ChatGPT will make things up.

I love this figure from the geography paper showing how ChatGPT estimates the elevations of mountains. This visual should be all over Twitter.

There are 3 lines because they did the prompt three times. ChatGPT threw out three different wrong mountains. Is that kind of work good enough for your tasks? Often it is. The shaded area in the graph is the actual topography of the earth in those places. ChatGPT “knows” that this area of the world is a mountain. But it will just put out incorrect estimates of the exact elevation, instead of stating that it does not know the exact elevation of those areas of the world.

Another free (long, advanced) resource with great pictures is Stephen Wolfram’s 2023 blog article “What Is ChatGPT Doing … and Why Does It Work?” (YouTube version)

The first thing to explain is that what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.

If you feel like you already are proficient with using ChatGPT, then I would recommend Wolfram’s blog because you will learn a lot about math and computers.

Scott wrote “Generative AI Nano-Tutorial” here, which has the advantage of being much shorter than Wolfram’s blog.

EDIT: New 2023 overview paper (link from Lenny): “A Survey of Large Language Models