Meta AI Chief Yann LeCun Notes Limits of Large Language Models and Path Towards Artificial General Intelligence

We noted last week Meta’s successful efforts to hire away the best of the best AI scientists from other companies, by offering them insane (like $300 million) pay packages. Here we summarize and excerpt an excellent article in Newsweek by Gabriel Snyder who interviewed Meta’s chief AI scientist, Yann LeCun. LeCun discusses some inherent limitations of today’s Large Language Models (LLMs) like ChatGPT. Their limitations stem from the fact that they are based mainly on language; it turns out that human language itself is a very constrained dataset.  Language is readily manipulated by LLMs, but language alone captures only a small subset of important human thinking:

Returning to the topic of the limitations of LLMs, LeCun explains, “An LLM produces one token after another. It goes through a fixed amount of computation to produce a token, and that’s clearly System 1—it’s reactive, right? There’s no reasoning,” a reference to Daniel Kahneman’s influential framework that distinguishes between the human brain’s fast, intuitive method of thinking (System 1) and the method of slower, more deliberative reasoning (System 2).

The limitations of this approach become clear when you consider what is known as Moravec’s paradox—the observation by computer scientist and roboticist Hans Moravec in the late 1980s that it is comparatively easier to teach AI systems higher-order skills like playing chess or passing standardized tests than seemingly basic human capabilities like perception and movement. The reason, Moravec proposed, is that the skills derived from how a human body navigates the world are the product of billions of years of evolution and are so highly developed that they can be automated by humans, while neocortical-based reasoning skills came much later and require much more conscious cognitive effort to master. However, the reverse is true of machines. Simply put, we design machines to assist us in areas where we lack ability, such as physical strength or calculation.

The strange paradox of LLMs is that they have mastered the higher-order skills of language without learning any of the foundational human abilities. “We have these language systems that can pass the bar exam, can solve equations, compute integrals, but where is our domestic robot?” LeCun asks. “Where is a robot that’s as good as a cat in the physical world? We don’t think the tasks that a cat can accomplish are smart, but in fact, they are.”

This gap exists because language, for all its complexity, operates in a relatively constrained domain compared to the messy, continuous real world. “Language, it turns out, is relatively simple because it has strong statistical properties,” LeCun says. It is a low-dimensionality, discrete space that is “basically a serialized version of our thoughts.”  

[Bolded emphases added]

Broad human thinking involves hierarchical models of reality, which get constantly refined by experience:

And, most strikingly, LeCun points out that humans are capable of processing vastly more data than even our most data-hungry advanced AI systems. “A big LLM of today is trained on roughly 10 to the 14th power bytes of training data. It would take any of us 400,000 years to read our way through it.” That sounds like a lot, but then he points out that humans are able to take in vastly larger amounts of visual data.

Consider a 4-year-old who has been awake for 16,000 hours, LeCun suggests. “The bandwidth of the optic nerve is about one megabyte per second, give or take. Multiply that by 16,000 hours, and that’s about 10 to the 14th power in four years instead of 400,000.” This gives rise to a critical inference: “That clearly tells you we’re never going to get to human-level intelligence by just training on text. It’s never going to happen,” LeCun concludes…

This ability to apply existing knowledge to novel situations represents a profound gap between today’s AI systems and human cognition. “A 17-year-old can learn to drive a car in about 20 hours of practice, even less, largely without causing any accidents,” LeCun muses. “And we have millions of hours of training data of people driving cars, but we still don’t have self-driving cars. So that means we’re missing something really, really big.”

Like Brooks, who emphasizes the importance of embodiment and interaction with the physical world, LeCun sees intelligence as deeply connected to our ability to model and predict physical reality—something current language models simply cannot do. This perspective resonates with David Eagleman’s description of how the brain constantly runs simulations based on its “world model,” comparing predictions against sensory input. 

For LeCun, the difference lies in our mental models—internal representations of how the world works that allow us to predict consequences and plan actions accordingly. Humans develop these models through observation and interaction with the physical world from infancy. A baby learns that unsupported objects fall (gravity) after about nine months; they gradually come to understand that objects continue to exist even when out of sight (object permanence). He observes that these models are arranged hierarchically, ranging from very low-level predictions about immediate physical interactions to high-level conceptual understandings that enable long-term planning.

[Emphases added]

(Side comment: As an amateur reader of modern philosophy, I cannot help noting that these observations about the importance of recognizing there is a real external world and adjusting one’s models to match that reality call into question the epistemological claim that “we each create our own reality”.)

Given all this, developing the next generation of artificial intelligence must, like human intelligence, embed layers of working models of the world:

So, rather than continuing down the path of scaling up language models, LeCun is pioneering an alternative approach of Joint Embedding Predictive Architecture (JEPA) that aims to create representations of the physical world based on visual input. “The idea that you can train a system to understand how the world works by training it to predict what’s going to happen in a video is a very old one,” LeCun notes. “I’ve been working on this in some form for at least 20 years.”

The fundamental insight behind JEPA is that prediction shouldn’t happen in the space of raw sensory inputs but rather in an abstract representational space. When humans predict what will happen next, we don’t mentally generate pixel-perfect images of the future—we think in terms of objects, their properties and how they might interact

This approach differs fundamentally from how language models operate. Instead of probabilistically predicting the next token in a sequence, these systems learn to represent the world at multiple levels of abstraction and to predict how their representations will evolve under different conditions.

And so, LeCun is strikingly pessimistic on the outlook for breakthroughs in the current LLM’s like ChatGPT. He believes LLMs will be largely obsolete within five years, except for narrower purposes, and so he tells upcoming AI scientists to not even bother with them:

His belief is so strong that, at a conference last year, he advised young developers, “Don’t work on LLMs. [These models are] in the hands of large companies, there’s nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs.”

This approach seems to be at variance with other firms, who continue to pour tens of billions of dollars into LLMs. Meta, however, seems focused on next-generation AI, and CEO Mark Zuckerberg is putting his money where his mouth is.

Discuss AI Doom with Joy on May 5

If you like to read and discuss with smart people, then you can make a free account in the Liberty Fund Portal. If you listen to this podcast over the weekend: Eliezer Yudkowsky on the Dangers of AI (2023) you will be up to speed for our asynchronous virtual debate room on Monday May 5.

Russ Roberts sums up the doomer argument using the following metaphor:

The metaphor is primitive. Zinjanthropus man or some primitive form of pre-Homo sapiens sitting around a campfire and human being shows up and says, ‘Hey, I got a lot of stuff I can teach you.’ ‘Oh, yeah. Come on in,’ and pointing out that it’s probable that we are either destroyed directly by murder or maybe just by out-competing all the previous hominids that came before us, and that in general, you wouldn’t want to invite something smarter than you into the campfire.

What do you think of this metaphor? By incorporating AI agents into society, are we inviting a smarter being to our campfire? Is it likely to eventually kill us out of contempt or neglect? That will be what we are discussing over in the Portal this week.

Is your P(Doom) < 0.05? Great – that means you believe that the probability of AI turning us into paperclips is less than 5%. Come one come all. You can argue against doomers during the May 5-9 week of Doom and then you will love Week Two. On May 12-16, we will make the optimistic case for AI!

See more details on all readings and the final Zoom meeting in my previous post.

Zuckerberg wants to solve general intelligence

Why does Mark Zuckerberg want to solve general intelligence? Well, for one thing, if he doesn’t, one of his competitors will have a better chatbot. Zuckerberg wants to be the best (and good for him). At his core, he wants to build the best stuff (even the world’s best cattle on his ranch).

If AGI is possible, it will get built. I’m not the first person to point out that this is a new space race. If America takes a pause, then someone else will get there first. However, I thought the Zuck interview was an interesting microcosm for why AGI, if possible, will get made.

… We started FAIR about 10 years ago. The idea was that, along the way to general intelligence or whatever you wanna call it, there are going to be all these different innovations and that’s going to just improve everything that we do. So we didn’t conceive of it as a product. It was more of a research group. Over the last 10 years it has created a lot of different things that have improved all of our products. …
There’s obviously a big change in the last few years with ChatGPT and the diffusion models around image creation coming out. This is some pretty wild stuff that is pretty clearly going to affect how people interact with every app that’s out there. At that point we started a second group, the gen AI group, with the goal of bringing that stuff into our products and building leading foundation models that would power all these different products.
… There’s also basic assistant functionality, whether it’s for our apps or the smart glasses or VR. So it wasn’t completely clear at first that you were going to need full AGI to be able to support those use cases. But in all these subtle ways, through working on them, I think it’s actually become clear that you do. …
Reasoning is another example. Maybe you want to chat with a creator or you’re a business and you’re trying to interact with a customer. That interaction is not just like “okay, the person sends you a message and you just reply.” It’s a multi-step interaction where you’re trying to think through “how do I accomplish the person’s goals?” A lot of times when a customer comes, they don’t necessarily know exactly what they’re looking for or how to ask their questions. So it’s not really the job of the AI to just respond to the question.
You need to kind of think about it more holistically. It really becomes a reasoning problem. So if someone else solves reasoning, or makes good advances on reasoning, and we’re sitting here with a basic chat bot, then our product is lame compared to what other people are building. At the end of the day, we basically realized we’ve got to solve general intelligence… (emphasis mine)

Credit to Dwarkesh Patel for this excellent interview. Credit to M.Z. for sharing his thoughts on topics that affect the world.

“we’ve got to solve general intelligence” If a competitor solves AGI first, then you are left behind. No one would not want general intelligence on their team, on the assumption that it can be controlled.

I would like the AGI to do my chores for me, please. Unfortunately, it’s more likely to be able to write my blog posts first.

What the Superintelligence can do for us

These days, when I blog-rant about my everyday life, I have increasingly ended on the thought “AGI fixes this.”

Yesterday, I mused whether AGI would be my personal chef? : Where Can You Still Buy a Great Dinner in the US?

Would AGI help me match my clothes that I no longer want to humans who can use them, to cut down on pollution?: Joy’s Fashion Globalization Article with Cato

Would AGI make no mistakes about weather-related school closure?: Intelligence for School Closing

Can AGI book summer camp for me?

As a millennial woman working through my 30’s, I increasingly see social media posts from my friends like this one:

One of the difficult things about infertility, for my friends going through it, is the uncertainty. Modern medicine seems legitimately short on information and predictive analytics for this issue. So… AGI to the rescue, someday?

All I’m writing about tonight is that I have created a growing to-do list, over roughly the past year, for the AGI. Would something smart enough to do all of the above be dangerous? I wouldn’t rule it out. As pure speculation, it feels safer to have an AI that is specifically devoted to being a personal chef but which strictly cannot do anything else beside manage food. An AI that could actually do all of those things… would be quite powerful.

Here’s me musing about the AGI rising up against us, written after watching the TV show Severance: Artificial Intelligence in the Basement of Lumon Industries