Google’s TPU Chips Threaten Nvidia’s Dominance in AI Computing

Here is a three-year chart of stock prices for Nvidia (NVDA), Alphabet/Google (GOOG), and the generic QQQ tech stock composite:

NVDA has been spectacular. If you had $20k in NVDA three years ago, it would have turned into nearly $200k. Sweet. Meanwhile, GOOG poked along at the general pace of QQQ.  Until…around Sept 1 (yellow line), GOOG started to pull away from QQQ, and has not looked back.

And in the past two months, GOOG stock has stomped all over NVDA, as shown in the six-month chart below. The two stocks were neck and neck in early October, then GOOG has surged way ahead. In the past month, GOOG is up sharply (red arrow), while NVDA is down significantly:

What is going on? It seems that the market is buying the narrative that Google’s Tensor Processing Unit (TPU) chips are a competitive threat to Nvidia’s GPUs. Last week, we published a tutorial on the technical details here. Briefly, Google’s TPUs are hardwired to perform key AI calculations, whereas Nvidia’s GPUs are more general-purpose. For a range of AI processing, the TPUs are faster and much more energy-efficient than the GPUs.

The greater flexibility of the Nvidia GPUs, and the programming community’s familiarity with Nvidia’s CUDA programming language, still gives Nvidia a bit of an edge in the AI training phase. But much of that edge fades for the inference (application) usages for AI. For the past few years, the big AI wannabes have focused madly on model training. But there must be a shift to inference (practical implementation) soon, for AI models to actually make money.

All this is a big potential headache for Nvidia. Because of their quasi-monopoly on AI compute, they have been able to charge a huge 75% gross profit margin on their chips. Their customers are naturally not thrilled with this, and have been making some efforts to devise alternatives. But it seems like Google, thanks to a big head start in this area, and very deep pockets, has actually equaled or even beaten Nvidia at its own game.

This explains much of the recent disparity in stock movements. It should be noted, however, that for a quirky business reason, Google is unlikely in the near term to displace Nvidia as the main go-to for AI compute power. The reason is this: most AI compute power is implemented in huge data/cloud centers. And Google is one of the three main cloud vendors, along with Microsoft and Amazon, with IBM and Oracle trailing behind. So, for Google to supply Microsoft and Amazon with its chips and accompanying know-how would be to enable its competitors to compete more strongly.

Also, AI users like say OpenAI would be reluctant to commit to usage in a Google-owned facility using Google chips, since then the user would be somewhat locked in and held hostage, since it would be expensive to switch to a different data center if Google tried to raise prices. On contrast, a user can readily move to a different data center for a better deal, if all the centers are using Nvidia chips.

For the present, then, Google is using its TPU technology primarily in-house. The company has a huge suite of AI-adjacent business lines, so its TPU capability does give it genuine advantages there. Reportedly, soul-searching continues in the Google C-suite about how to more broadly monetize its TPUs. It seems likely that they will find a way. 

As usual, nothing here constitutes advice to buy or sell any security.

AI Computing Tutorial: Training vs. Inference Compute Needs, and GPU vs. TPU Processors

A tsunami of sentiment shift is washing over Wall Street, away from Nvidia and towards Google/Alphabet. In the past month, GOOG stock is up a sizzling 12%, while NVDA plunged 13%, despite producing its usual earnings beat.  Today I will discuss some of the technical backdrop to this sentiment shift, which involves the differences between training AI models versus actually applying them to specific problems (“inference”), and significantly different processing chips. Next week I will cover the company-specific implications.

As most readers here probably know, the popular Large Language Models (LLM) that underpin the popular new AI products work by sucking in nearly all the text (and now other data) that humans have ever produced, reducing each word or form of a word to a numerical token, and grinding and grinding to discover consistent patterns among those tokens. Layers of (virtual) neural nets are used. The training process involves an insane amount of trying to predict, say, the next word in a sentence scraped from the web, evaluating why the model missed it, and feeding that information back to adjust the matrix of weights on the neural layers, until the model can predict that next word correctly. Then on to the next sentence found on the internet, to work and work until it can be predicted properly. At the end of the day, a well-trained AI chatbot can respond to Bob’s complaint about his boss with an appropriately sympathetic pseudo-human reply like, “It sounds like your boss is not treating you fairly, Bob. Tell me more about…” It bears repeating that LLMs do not actually “know” anything. All they can do is produce a statistically probably word salad in response to prompts. But they can now do that so well that they are very useful.*

This is an oversimplification, but gives the flavor of the endless forward and backward propagation and iteration that is required for model training. This training typically requires running vast banks of very high-end processors, typically housed in large, power-hungry data centers, for months at a time.

Once a model is trained (e.g., the neural net weights have been determined), to then run it (i.e., to generate responses based on human prompts) takes considerably less compute power. This is the “inference” phase of generative AI. It still takes a lot of compute to run a big program quickly, but a simpler LLM like DeepSeek can be run, with only modest time lags, on a high end PC.

GPUs Versus ASIC TPUs

Nvidia has made its fortune by taking graphical processing units (GPU) that were developed for massively parallel calculations needed for driving video displays, and adapting them to more general problem solving that could make use of rapid matrix calculations. Nvidia chips and its CUDA language have been employed for physical simulations such as seismology and molecular dynamics, and then for Bitcoin calculations. When generative AI came along, Nvidia chips and programming tools were the obvious choice for LLM computing needs. The world’s lust for AI compute is so insatiable, and Nvidia has had such a stranglehold, that the company has been able to charge an eye-watering gross profit margin of around 75% on its chips.

AI users of course are trying desperately to get compute capability without have to pay such high fees to Nvidia. It has been hard to mount a serious competitive challenge, though. Nvidia has a commanding lead in hardware and supporting software, and (unlike the Intel of years gone by) keeps forging ahead, not resting on its laurels. 

So far, no one seems to be able to compete strongly with Nvidia in GPUs. However, there is a different chip architecture, which by some measures can beat GPUs at their own game.

NVIDIA GPUs are general-purpose parallel processors with high flexibility, capable of handling a wide range of tasks from gaming to AI training, supported by a mature software ecosystem like CUDA. GPUs beat out the original computer central processing units (CPUs) for these tasks by sacrificing flexibility for the power to do parallel processing of many simple, repetitive operations. The newer “application-specific integrated circuits” (ASICs) take this specialization a step further. They can be custom hard-wired to do specific calculations, such as those required for bitcoin and now for AI. By cutting out steps used by GPUs, especially fetching data in and out of memory, ASICs can do many AI computing tasks faster and cheaper than Nvidia GPUs, and using much less electric power. That is a big plus, since AI data centers are driving up electricity prices in many parts of the country. The particular type of ASIC that is used by Google for AI is called a Tensor Processing Unit (TPU).

I found this explanation by UncoverAlpha to be enlightening:

A GPU is a “general-purpose” parallel processor, while a TPU is a “domain-specific” architecture.

The GPUs were designed for graphics. They excel at parallel processing (doing many things at once), which is great for AI. However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” They spend significant energy and chip area on complex tasks like caching, branch prediction, and managing independent threads.

A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping. Instead, it uses a unique architecture called a Systolic Array.

The “Systolic Array” is the key differentiator. In a standard CPU or GPU, the chip moves data back and forth between the memory and the computing units for every calculation. This constant shuffling creates a bottleneck (the Von Neumann bottleneck).

In a TPU’s systolic array, data flows through the chip like blood through a heart (hence “systolic”).

  1. It loads data (weights) once.
  2. It passes inputs through a massive grid of multipliers.
  3. The data is passed directly to the next unit in the array without writing back to memory.

What this means, in essence, is that a TPU, because of its systolic array, drastically reduces the number of memory reads and writes required from HBM. As a result, the TPU can spend its cycles computing rather than waiting for data.

Google has developed the most advanced ASICs for doing AI, which are now on some levels a competitive threat to Nvidia.   Some implications of this will be explored in a post next week.

*Next generation AI seeks to step beyond the LLM world of statistical word salads, and try to model cause and effect at the level of objects and agents in the real world – – see Meta AI Chief Yann LeCun Notes Limits of Large Language Models and Path Towards Artificial General Intelligence .

Standard disclaimer: Nothing here should be considered advice to buy or sell any security.

How Repurposing Graphic Processing Chips Made Nvidia the Most Valuable Company on Earth

Folks who follow the stock market know that the average company in the S&P 500 has gone essentially nowhere in the last couple of years. What has pulled the averages higher and higher has been the outstanding performance of a handful of big tech stocks. Foremost among these is Nvidia. Its share price has tripled in the past year, after nearly tripling in the previously twelve months. Its market value climbed to $3.3 trillion last week, briefly surpassing tech behemoths Microsoft and Apple as the most valuable company in the world.

What just happened here?

It all began in 1993 when Taiwanese-American electrical engineer Jensen Huang and two other Silicon Valley techies met in a Denny’s in East San Jose and decided to start their own company. Their focus was making graphics acceleration boards for video games. Computing devices such as computers, game stations, and smart phones have at their core a central processing unit, CPU. A strength of CPUs is their versatility. They can do a lot of different tasks, but sequentially and thus at a limited speed.  To oversimplify, a CPU fetches an instruction (command), and then loads maybe two chunks of data, then performs the instructed calculations on those data, and then stores the result somewhere else, and then turns around and fetches the next instruction. With clever programming, some tasks can be broken up into multiple pieces that can be processed in parallel on several CPU cores at once, but that only goes so far.

Processing large amounts of graphics data, such as rendering a high-resolution active video game, requires an enormous amount of computing. However, these calculations are largely all the same type, so a versatile processing chip like a CPU is not required. Graphics processing units (GPUs), originally termed graphics accelerators, are designed to do enormous number of these simple calculations simultaneously. To offload the burden on the CPU, computers and game stations for decades have included on auxiliary GPU (“graphics card”) alongside the CPU.

This was the original target for Nvidia. Video gaming was expanding rapidly, and they saw a niche for innovative graphics processors. Unfortunately, they the processing architecture they choose to work on fell out of favor, and they skated right up to the edge of going bankrupt. In 1993 Nvidia was down to 30 days before closing their doors, but at the last moment they got a $5 million loan to keep them afloat. Nvidia clawed its way back from the brink and managed to make and sell a series of popular graphics processors.

However, management had a vision that the massively parallel processing power of their chips could be applied to more exulted uses than rendering blood spatters in Call of Duty.  The types of matrix calculations done in GPUs can be used in a wide variety of physical simulations such as seismology and molecular dynamics. In 2007, and video released its CUDA platform for using GPUs for accelerated general purpose processing. Since then, Nvidia has promoting the use of its GPUs as general hardware for scientific computing, in addition to the classic graphics applications.

This line of business exploded starting around 2019, with the bitcoin craze. Crypto currencies require enormous amount of computing power, and these types of calculations are amenable to being performed in massively parallel GPUs. Serious bitcoin mining companies set up racks of processors, built on NVIDIA GPUs. GPUs did have serious competition from other types of processors for the crypto mining applications, so they did not have the field to themselves. With people stuck at home in 2020-2021, demand for GPUs rose even further: more folks sitting on couches playing video games, and more cloud computing for remote work.

Nvidia Dominates AI Computing

Now the whole world cannot get enough of machine learning and generative AI. And Nvidia chips totally dominate that market. Nvidia supplies not only the hardware (chips) but also a software platform to allow programmers to make use of the chips. With so many programmers and applications standardized now on the Nvidia platform, its dominance and profitability should persist for many years.

Nearly all their chips are manufactured in Taiwan, so that provides a geopolitical risk, not only for Nvidia but for all enterprises that depend on high end AI processing.