Google’s TPU Chips Threaten Nvidia’s Dominance in AI Computing

Here is a three-year chart of stock prices for Nvidia (NVDA), Alphabet/Google (GOOG), and the generic QQQ tech stock composite:

NVDA has been spectacular. If you had $20k in NVDA three years ago, it would have turned into nearly $200k. Sweet. Meanwhile, GOOG poked along at the general pace of QQQ.  Until…around Sept 1 (yellow line), GOOG started to pull away from QQQ, and has not looked back.

And in the past two months, GOOG stock has stomped all over NVDA, as shown in the six-month chart below. The two stocks were neck and neck in early October, then GOOG has surged way ahead. In the past month, GOOG is up sharply (red arrow), while NVDA is down significantly:

What is going on? It seems that the market is buying the narrative that Google’s Tensor Processing Unit (TPU) chips are a competitive threat to Nvidia’s GPUs. Last week, we published a tutorial on the technical details here. Briefly, Google’s TPUs are hardwired to perform key AI calculations, whereas Nvidia’s GPUs are more general-purpose. For a range of AI processing, the TPUs are faster and much more energy-efficient than the GPUs.

The greater flexibility of the Nvidia GPUs, and the programming community’s familiarity with Nvidia’s CUDA programming language, still gives Nvidia a bit of an edge in the AI training phase. But much of that edge fades for the inference (application) usages for AI. For the past few years, the big AI wannabes have focused madly on model training. But there must be a shift to inference (practical implementation) soon, for AI models to actually make money.

All this is a big potential headache for Nvidia. Because of their quasi-monopoly on AI compute, they have been able to charge a huge 75% gross profit margin on their chips. Their customers are naturally not thrilled with this, and have been making some efforts to devise alternatives. But it seems like Google, thanks to a big head start in this area, and very deep pockets, has actually equaled or even beaten Nvidia at its own game.

This explains much of the recent disparity in stock movements. It should be noted, however, that for a quirky business reason, Google is unlikely in the near term to displace Nvidia as the main go-to for AI compute power. The reason is this: most AI compute power is implemented in huge data/cloud centers. And Google is one of the three main cloud vendors, along with Microsoft and Amazon, with IBM and Oracle trailing behind. So, for Google to supply Microsoft and Amazon with its chips and accompanying know-how would be to enable its competitors to compete more strongly.

Also, AI users like say OpenAI would be reluctant to commit to usage in a Google-owned facility using Google chips, since then the user would be somewhat locked in and held hostage, since it would be expensive to switch to a different data center if Google tried to raise prices. On contrast, a user can readily move to a different data center for a better deal, if all the centers are using Nvidia chips.

For the present, then, Google is using its TPU technology primarily in-house. The company has a huge suite of AI-adjacent business lines, so its TPU capability does give it genuine advantages there. Reportedly, soul-searching continues in the Google C-suite about how to more broadly monetize its TPUs. It seems likely that they will find a way. 

As usual, nothing here constitutes advice to buy or sell any security.

AI Computing Tutorial: Training vs. Inference Compute Needs, and GPU vs. TPU Processors

A tsunami of sentiment shift is washing over Wall Street, away from Nvidia and towards Google/Alphabet. In the past month, GOOG stock is up a sizzling 12%, while NVDA plunged 13%, despite producing its usual earnings beat.  Today I will discuss some of the technical backdrop to this sentiment shift, which involves the differences between training AI models versus actually applying them to specific problems (“inference”), and significantly different processing chips. Next week I will cover the company-specific implications.

As most readers here probably know, the popular Large Language Models (LLM) that underpin the popular new AI products work by sucking in nearly all the text (and now other data) that humans have ever produced, reducing each word or form of a word to a numerical token, and grinding and grinding to discover consistent patterns among those tokens. Layers of (virtual) neural nets are used. The training process involves an insane amount of trying to predict, say, the next word in a sentence scraped from the web, evaluating why the model missed it, and feeding that information back to adjust the matrix of weights on the neural layers, until the model can predict that next word correctly. Then on to the next sentence found on the internet, to work and work until it can be predicted properly. At the end of the day, a well-trained AI chatbot can respond to Bob’s complaint about his boss with an appropriately sympathetic pseudo-human reply like, “It sounds like your boss is not treating you fairly, Bob. Tell me more about…” It bears repeating that LLMs do not actually “know” anything. All they can do is produce a statistically probably word salad in response to prompts. But they can now do that so well that they are very useful.*

This is an oversimplification, but gives the flavor of the endless forward and backward propagation and iteration that is required for model training. This training typically requires running vast banks of very high-end processors, typically housed in large, power-hungry data centers, for months at a time.

Once a model is trained (e.g., the neural net weights have been determined), to then run it (i.e., to generate responses based on human prompts) takes considerably less compute power. This is the “inference” phase of generative AI. It still takes a lot of compute to run a big program quickly, but a simpler LLM like DeepSeek can be run, with only modest time lags, on a high end PC.

GPUs Versus ASIC TPUs

Nvidia has made its fortune by taking graphical processing units (GPU) that were developed for massively parallel calculations needed for driving video displays, and adapting them to more general problem solving that could make use of rapid matrix calculations. Nvidia chips and its CUDA language have been employed for physical simulations such as seismology and molecular dynamics, and then for Bitcoin calculations. When generative AI came along, Nvidia chips and programming tools were the obvious choice for LLM computing needs. The world’s lust for AI compute is so insatiable, and Nvidia has had such a stranglehold, that the company has been able to charge an eye-watering gross profit margin of around 75% on its chips.

AI users of course are trying desperately to get compute capability without have to pay such high fees to Nvidia. It has been hard to mount a serious competitive challenge, though. Nvidia has a commanding lead in hardware and supporting software, and (unlike the Intel of years gone by) keeps forging ahead, not resting on its laurels. 

So far, no one seems to be able to compete strongly with Nvidia in GPUs. However, there is a different chip architecture, which by some measures can beat GPUs at their own game.

NVIDIA GPUs are general-purpose parallel processors with high flexibility, capable of handling a wide range of tasks from gaming to AI training, supported by a mature software ecosystem like CUDA. GPUs beat out the original computer central processing units (CPUs) for these tasks by sacrificing flexibility for the power to do parallel processing of many simple, repetitive operations. The newer “application-specific integrated circuits” (ASICs) take this specialization a step further. They can be custom hard-wired to do specific calculations, such as those required for bitcoin and now for AI. By cutting out steps used by GPUs, especially fetching data in and out of memory, ASICs can do many AI computing tasks faster and cheaper than Nvidia GPUs, and using much less electric power. That is a big plus, since AI data centers are driving up electricity prices in many parts of the country. The particular type of ASIC that is used by Google for AI is called a Tensor Processing Unit (TPU).

I found this explanation by UncoverAlpha to be enlightening:

A GPU is a “general-purpose” parallel processor, while a TPU is a “domain-specific” architecture.

The GPUs were designed for graphics. They excel at parallel processing (doing many things at once), which is great for AI. However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” They spend significant energy and chip area on complex tasks like caching, branch prediction, and managing independent threads.

A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping. Instead, it uses a unique architecture called a Systolic Array.

The “Systolic Array” is the key differentiator. In a standard CPU or GPU, the chip moves data back and forth between the memory and the computing units for every calculation. This constant shuffling creates a bottleneck (the Von Neumann bottleneck).

In a TPU’s systolic array, data flows through the chip like blood through a heart (hence “systolic”).

  1. It loads data (weights) once.
  2. It passes inputs through a massive grid of multipliers.
  3. The data is passed directly to the next unit in the array without writing back to memory.

What this means, in essence, is that a TPU, because of its systolic array, drastically reduces the number of memory reads and writes required from HBM. As a result, the TPU can spend its cycles computing rather than waiting for data.

Google has developed the most advanced ASICs for doing AI, which are now on some levels a competitive threat to Nvidia.   Some implications of this will be explored in a post next week.

*Next generation AI seeks to step beyond the LLM world of statistical word salads, and try to model cause and effect at the level of objects and agents in the real world – – see Meta AI Chief Yann LeCun Notes Limits of Large Language Models and Path Towards Artificial General Intelligence .

Standard disclaimer: Nothing here should be considered advice to buy or sell any security.

Will the Huge Corporate Spending on AI Pay Off?

Last Tuesday I posted on the topic, “Tech Stocks Sag as Analysists Question How Much Money Firms Will Actually Make from AI”. Here I try to dig a little deeper into the question of whether there will be a reasonable return on the billions of dollars that tech firms are investing into this area.

Cloud providers like Microsoft, Amazon, and Google are building buying expensive GPU chips (mainly from Nvidia) and installing them in power-hungry data centers. This hardware is being cranked to train large language models on a world’s-worth of existing information. Will it pay off?

Obviously, we can dream up all sorts of applications for these large language models (LLMs), but the question is much potential downstream customers are willing to pay for these capabilities. I don’t have the capability for an expert appraisal, so I will just post some excerpts here.

Up until two months ago, it seemed there was little concern about the returns on this investment.  The only worry seemed to be not investing enough. This attitude was exemplified by Sundar Pichai of Alphabet (Google). During the Q2 earnings call, he was asked what the return on Gen AI investment capex would be. Instead of answering the question directly, he said:

I think the one way I think about it is when we go through a curve like this, the risk of under-investing is dramatically greater than the risk of over-investing for us here, even in scenarios where if it turns out that we are over investing. [my emphasis]

Part of the dynamic here is FOMO among the tech titans, as they compete for the internet search business:

The entire Gen AI capex boom started when Microsoft invested in OpenAI in late 2022 to directly challenge Google Search.

Naturally, Alphabet was forced to develop its own Gen AI LLM product to defend its core business – Search. Meta joined in the Gen AI capex race, together with Amazon, in fear of not being left out – which led to a massive Gen AI capex boom.

Nvidia has reportedly estimated that for every dollar spent on their GPU chips, “the big cloud service providers could generate $5 in GPU instant hosting over a span of four years. And API providers could generate seven bucks over that same timeframe.” Sounds like a great cornucopia for the big tech companies who are pouring tens of billions of dollars into this. What could possibly go wrong?

In late June, Goldman Sachs published a report titled, GEN AI: TOO MUCH SPEND,TOO LITTLE BENEFIT?.  This report included contributions from bulls and from bears. The leading Goldman skeptic is Jim Covello. He argues,

To earn an adequate return on the ~$1tn estimated cost of developing and running AI technology, it must be able to solve complex problems, which, he says, it isn’t built to do. He points out that truly life-changing inventions like the internet enabled low-cost solutions to disrupt high-cost solutions even in its infancy, unlike costly AI tech today. And he’s skeptical that AI’s costs will ever decline enough to make automating a large share of tasks affordable given the high starting point as well as the complexity of building critical inputs—like GPU chips—which may prevent competition. He’s also doubtful that AI will boost the valuation of companies that use the tech, as any efficiency gains would likely be competed away, and the path to actually boosting revenues is unclear.

MIT’s Daron Acemoglu is likewise skeptical:  He estimates that only a quarter of AI-exposed tasks will be cost-effective to automate within the next 10 years, implying that AI will impact less than 5% of all tasks. And he doesn’t take much comfort from history that shows technologies improving and becoming less costly over time, arguing that AI model advances likely won’t occur nearly as quickly—or be nearly as impressive—as many believe. He also questions whether AI adoption will create new tasks and products, saying these impacts are “not a law of nature.” So, he forecasts AI will increase US productivity by only 0.5% and GDP growth by only 0.9% cumulatively over the next decade.

Goldman economist Joseph Briggs is more optimistic:  He estimates that gen AI will ultimately automate 25% of all work tasks and raise US productivity by 9% and GDP growth by 6.1% cumulatively over the next decade. While Briggs acknowledges that automating many AI-exposed tasks isn’t cost-effective today, he argues that the large potential for cost savings and likelihood that costs will decline over the long run—as is often, if not always, the case with new technologies—should eventually lead to more AI automation. And, unlike Acemoglu, Briggs incorporates both the potential for labor reallocation and new task creation into his productivity estimates, consistent with the strong and long historical record of technological innovation driving new opportunities.

The Goldman report also cautioned that the U.S. and European power grids may not be prepared for the major extra power needed to run the new data centers.

Perhaps the earliest major cautionary voice was that of Sequoia’s David Cahn. Sequoia is a major venture capital firm. In September, 2023 Cahn offered a simple calculation estimating that for each dollar spent on (Nvidia) GPUs, and another dollar (mainly electricity) would need be spent by the cloud vendor in running the data center. To make this economical, the cloud vendor would need to pull in a total of about $4.00 in revenue. If vendors are installing roughly $50 billion in GPUs this year, then they need to pull in some $200 billion in revenues. But the projected AI revenues from Microsoft, Amazon, Google, etc., etc. were less than half that amount, leaving (as of Sept 2023) a $125 billion dollar shortfall.

As he put it, “During historical technology cycles, overbuilding of infrastructure has often incinerated capital, while at the same time unleashing future innovation by bringing down the marginal cost of new product development. We expect this pattern will repeat itself in AI.” This can be good for some of the end users, but not so good for the big tech firms rushing to spend here.

In his June, 2024 update, Cahn notes that now Nvidia yearly sales look to be more like $150 billion, which in turn requires the cloud vendors to pull in some  $600 billion in added revenues to make this spending worthwhile. Thus, the $125 billion shortfall is now more like a $500 billion (half a trillion!) shortfall. He notes further that the rapid improvement in chip power means that the value of those expensive chips being installed in 2024 will be a lot lower in 2025.

And here is a random cynical comment on a Seeking Alpha article: It was the perfect combination of years of Hollywood science fiction setting the table with regard to artificial intelligence and investors looking for something to replace the bitcoin and metaverse hype. So when ChatGPT put out answers that sounded human, people let their imaginations run wild. The fact that it consumes an incredible amount of processing power, that there is no actual artificial intelligence there, it cannot distinguish between truth and misinformation, and also no ROI other than the initial insane burst of chip sales – well, here we are and R2-D2 and C3PO are not reporting to work as promised.

All this makes a case that the huge spends by Microsoft, Amazon, Google, and the like may not pay off as hoped. Their share prices have steadily levitated since January 2023 due to the AI hype, and indeed have been almost entirely responsible for the rise in the overall S&P 500 index, but their prices have all cratered in the past month. Whether or not these tech titans make money here, it seems likely that Nvidia (selling picks and shovels to the gold miners) will continue to mint money. Also, some of the final end users of Gen AI will surely find lucrative applications. I wish I knew how to pick the winners from the losers here.

For instance, the software service company ServiceNow is finding value in Gen AI. According to Morgan Stanley analyst Keith Weiss, “Gen AI momentum is real and continues to build. Management noted that net-new ACV for the Pro Plus edition (the SKU that incorporates ServiceNow’s Gen AI capabilities) doubled [quarter-over-quarter] with Pro Plus delivering 11 deals over $1M including two deals over $5M. Furthermore, Pro Plus realized a 30% price uplift and average deal sizes are up over 3x versus comparable deals during the Pro adoption cycle.”

The Open Internet Is Dead; Long Live The Open Internet

Information on the internet was born free, but now lives everywhere in walled gardens. Blogging sometimes feels like a throwback to an earlier era. So many newer platforms have eclipsed blogs in popularity, almost all of which are harder to search and discover. Facebook was walled off from the beginning, Twitter is becoming more so. Podcasts and video tend to be open in theory, but hard to search as most lack transcripts. Longer-form writing is increasingly hidden behind paywalls on news sites and Substack. People have complained for years that Google search is getting worse; there are many reasons for this, like a complacent company culture and the cat-and-mouse game with SEO companies, but one is this rising tide of content that is harder to search and link.

To me part of the value of blogging is precisely that it remains open in an increasingly closed world. Its influence relative to the rest of the internet has waned since its heydey in ~2009, but most of this is due to how the rest of the internet has grown explosively at the expense of the real world; in absolute terms the influence of blogging remains high, and perhaps rising.

The closing internet of late 2023 will not last forever. Like so much else, AI is transforming it, for better and worse. AI is making it cheap and easy to produce transcripts of podcasts and videos, making them more searchable. Because AI needs large amounts of text to train models, text becomes more valuable. Open blogs become more influential because they become part of the training data for AI; because of what we have written here, AI will think and sound a little bit more like us. I think this is great, but others have the opposite reaction. The New York Times is suing to exclude their data from training AIs, and to delete any models trained with it. Twitter is becoming more closed partly in an attempt to limit scraping by AIs.

So AI leads to human material being easier for search engines to index, and some harder; it also means there will be a flood of AI-produced material, mostly low-quality, clogging up search results. The perpetual challenge of search engines putting relevant, high-quality results first will become much harder, a challenge which AI will of course be set to solve. Search engines already have surprisingly big problems with not indexing writing at all; searching for a post on my old blog with exact quotes and not finding it made me realize Google was missing some posts there, and Bing and DuckDuckGo were missing all of them. While we’re waiting for AI to solve and/or worsen this problem, Gwern has a great page of tips on searching for hard-to-find documents and information, both the kind that is buried deep down in Google and the kind that is not there at all.