Meta AI Chief Yann LeCun Notes Limits of Large Language Models and Path Towards Artificial General Intelligence

We noted last week Meta’s successful efforts to hire away the best of the best AI scientists from other companies, by offering them insane (like $300 million) pay packages. Here we summarize and excerpt an excellent article in Newsweek by Gabriel Snyder who interviewed Meta’s chief AI scientist, Yann LeCun. LeCun discusses some inherent limitations of today’s Large Language Models (LLMs) like ChatGPT. Their limitations stem from the fact that they are based mainly on language; it turns out that human language itself is a very constrained dataset.  Language is readily manipulated by LLMs, but language alone captures only a small subset of important human thinking:

Returning to the topic of the limitations of LLMs, LeCun explains, “An LLM produces one token after another. It goes through a fixed amount of computation to produce a token, and that’s clearly System 1—it’s reactive, right? There’s no reasoning,” a reference to Daniel Kahneman’s influential framework that distinguishes between the human brain’s fast, intuitive method of thinking (System 1) and the method of slower, more deliberative reasoning (System 2).

The limitations of this approach become clear when you consider what is known as Moravec’s paradox—the observation by computer scientist and roboticist Hans Moravec in the late 1980s that it is comparatively easier to teach AI systems higher-order skills like playing chess or passing standardized tests than seemingly basic human capabilities like perception and movement. The reason, Moravec proposed, is that the skills derived from how a human body navigates the world are the product of billions of years of evolution and are so highly developed that they can be automated by humans, while neocortical-based reasoning skills came much later and require much more conscious cognitive effort to master. However, the reverse is true of machines. Simply put, we design machines to assist us in areas where we lack ability, such as physical strength or calculation.

The strange paradox of LLMs is that they have mastered the higher-order skills of language without learning any of the foundational human abilities. “We have these language systems that can pass the bar exam, can solve equations, compute integrals, but where is our domestic robot?” LeCun asks. “Where is a robot that’s as good as a cat in the physical world? We don’t think the tasks that a cat can accomplish are smart, but in fact, they are.”

This gap exists because language, for all its complexity, operates in a relatively constrained domain compared to the messy, continuous real world. “Language, it turns out, is relatively simple because it has strong statistical properties,” LeCun says. It is a low-dimensionality, discrete space that is “basically a serialized version of our thoughts.”  

[Bolded emphases added]

Broad human thinking involves hierarchical models of reality, which get constantly refined by experience:

And, most strikingly, LeCun points out that humans are capable of processing vastly more data than even our most data-hungry advanced AI systems. “A big LLM of today is trained on roughly 10 to the 14th power bytes of training data. It would take any of us 400,000 years to read our way through it.” That sounds like a lot, but then he points out that humans are able to take in vastly larger amounts of visual data.

Consider a 4-year-old who has been awake for 16,000 hours, LeCun suggests. “The bandwidth of the optic nerve is about one megabyte per second, give or take. Multiply that by 16,000 hours, and that’s about 10 to the 14th power in four years instead of 400,000.” This gives rise to a critical inference: “That clearly tells you we’re never going to get to human-level intelligence by just training on text. It’s never going to happen,” LeCun concludes…

This ability to apply existing knowledge to novel situations represents a profound gap between today’s AI systems and human cognition. “A 17-year-old can learn to drive a car in about 20 hours of practice, even less, largely without causing any accidents,” LeCun muses. “And we have millions of hours of training data of people driving cars, but we still don’t have self-driving cars. So that means we’re missing something really, really big.”

Like Brooks, who emphasizes the importance of embodiment and interaction with the physical world, LeCun sees intelligence as deeply connected to our ability to model and predict physical reality—something current language models simply cannot do. This perspective resonates with David Eagleman’s description of how the brain constantly runs simulations based on its “world model,” comparing predictions against sensory input. 

For LeCun, the difference lies in our mental models—internal representations of how the world works that allow us to predict consequences and plan actions accordingly. Humans develop these models through observation and interaction with the physical world from infancy. A baby learns that unsupported objects fall (gravity) after about nine months; they gradually come to understand that objects continue to exist even when out of sight (object permanence). He observes that these models are arranged hierarchically, ranging from very low-level predictions about immediate physical interactions to high-level conceptual understandings that enable long-term planning.

[Emphases added]

(Side comment: As an amateur reader of modern philosophy, I cannot help noting that these observations about the importance of recognizing there is a real external world and adjusting one’s models to match that reality call into question the epistemological claim that “we each create our own reality”.)

Given all this, developing the next generation of artificial intelligence must, like human intelligence, embed layers of working models of the world:

So, rather than continuing down the path of scaling up language models, LeCun is pioneering an alternative approach of Joint Embedding Predictive Architecture (JEPA) that aims to create representations of the physical world based on visual input. “The idea that you can train a system to understand how the world works by training it to predict what’s going to happen in a video is a very old one,” LeCun notes. “I’ve been working on this in some form for at least 20 years.”

The fundamental insight behind JEPA is that prediction shouldn’t happen in the space of raw sensory inputs but rather in an abstract representational space. When humans predict what will happen next, we don’t mentally generate pixel-perfect images of the future—we think in terms of objects, their properties and how they might interact

This approach differs fundamentally from how language models operate. Instead of probabilistically predicting the next token in a sequence, these systems learn to represent the world at multiple levels of abstraction and to predict how their representations will evolve under different conditions.

And so, LeCun is strikingly pessimistic on the outlook for breakthroughs in the current LLM’s like ChatGPT. He believes LLMs will be largely obsolete within five years, except for narrower purposes, and so he tells upcoming AI scientists to not even bother with them:

His belief is so strong that, at a conference last year, he advised young developers, “Don’t work on LLMs. [These models are] in the hands of large companies, there’s nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs.”

This approach seems to be at variance with other firms, who continue to pour tens of billions of dollars into LLMs. Meta, however, seems focused on next-generation AI, and CEO Mark Zuckerberg is putting his money where his mouth is.

Meta Is Poaching AI Talent With $100 Million Pay Packages; Will This Finally Create AGI?

This month I have run across articles noting that Meta’s Mark Zuckerberg has been making mind-boggling pay offers (like $100 million/year for 3-4 years) to top AI researchers at other companies, plus the promise of huge resources and even (gasp) personal access to Zuck, himself. Reports indicate that he is succeeding in hiring around 50 brains from OpenAI (home of ChatGPT), Anthropic, Google, and Apple. Maybe this concentration of human intelligence will result in the long-craved artificial general intelligence (AGI) being realized; there seems to be some recognition that the current Large Language Models will not get us there.

There are, of course, other interpretations being put on this maneuver. Some talking heads on a Bloomberg podcast speculated that Zuckerberg was using Meta’s mighty cash flow deliberately to starve competitors of top AI talent. They also speculated that (since there is a limit to how much money you can possibly, pleasurably spend) – – if you pay some guy $100 million in a year, a rational outcome would be he would quit and spend the rest of his life hanging out at the beach. (That, of course, is what Bloomberg finance types might think, who measure worth mainly in terms of money, not in the fun of doing cutting edge R&D).

I found a thread on reddit to be insightful and amusing, and so I post chunks of it below. Here is the earnest, optimist OP:

andsi2asi

Zuckerberg’s ‘Pay Them Nine-Figure Salaries’ Stroke of Genius for Building the Most Powerful AI in the World

Frustrated by Yann LeCun’s inability to advance Llama to where it is seriously competing with top AI models, Zuckerberg has decided to employ a strategy that makes consummate sense.

To appreciate the strategy in context, keep in mind that OpenAI expects to generate $10 billion in revenue this year, but will also spend about $28 billion, leaving it in the red by about $18 billion. My main point here is that we’re talking big numbers.

Zuckerberg has decided to bring together 50 ultra-top AI engineers by enticing them with nine-figure salaries. Whether they will be paid $100 million or $300 million per year has not been disclosed, but it seems like they will be making a lot more in salary than they did at their last gig with Google, OpenAI, Anthropic, etc.

If he pays each of them $100 million in salary, that will cost him $5 billion a year. Considering OpenAI’s expenses, suddenly that doesn’t sound so unreasonable.

I’m guessing he will succeed at bringing this AI dream team together. It’s not just the allure of $100 million salaries. It’s the opportunity to build the most powerful AI with the most brilliant minds in AI. Big win for AI. Big win for open source

And here are some wry responses:

kayakdawg

counterpoint 

a. $5B is just for those 50 researchers, loootttaaa other costs to consider

b. zuck has a history of burning big money on r&d with theoretical revenue that doesnt materialize

c. brooks law: creating agi isn’t an easily divisible job – in fact, it seems reasonable to assume that the more high-level experts enter the project the slower it’ll progress given the communication overhead

7FootElvis

Exactly. Also, money alone doesn’t make leadership effective. OpenAI has a relatively single focus. Meta is more diversified, which can lead to a lack of necessary vision in this one department. Passion, if present at the top, is also critical for bleeding edge advancement. Is Zuckerberg more passionate than Altman about AI? Which is more effective at infusing that passion throughout the organization?

….

dbenc

and not a single AI researcher is going to tell Zuck “well, no matter how much you pay us we won’t be able to make AGI”

meltbox

I will make the AI by one year from now if I am paid $100m

I just need total blackout so I can focus. Two years from now I will make it run on a 50w chip.

I promise

Did Apple’s Recent “Illusion of Thinking” Study Expose Fatal Shortcomings in Using LLM’s for Artificial General Intelligence?

Researchers at Apple last week published with the provocative title, “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity.”  This paper has generated uproar in the AI world. Having “The Illusion of Thinking” right there in the title is pretty in-your-face.

Traditional Large Language Model (LLM) artificial intelligence programs like ChatGPT train on massive amounts of human-generated text to be able to mimic human outputs when given prompts. A recent trend (mainly starting in 2024) has been the incorporation of more formal reasoning capabilities into these models. The enhanced models are termed Large Reasoning Models (LRMs). Now some leading LLMs like Open AI’s GPT, Claude, and the Chinese DeepSeek exist both in regular LLM form and also as LRM versions.

The authors applied both the regular (LLM) and “thinking” LRM versions of Claude 3.7 Sonnet and DeepSeek to a number of mathematical type puzzles. Open AI’s o-series were used to a lesser extent. An advantage of these puzzles is that researchers can, while keeping the basic form of the puzzle, dial in more or less complexity.

They found, among other things, that the LRMs did well up to a certain point, then suffered “complete collapse” as complexity was increased. Also, at low complexities, LLMs actually outperform LRMs. And (perhaps the most vivid evidence of lack of actual understanding on the part of these programs), when they were explicitly offered an efficient direct solution algorithm in the prompt, the programs did not take advantage of it, but instead just kept grinding away in their usual fashion.

As might be expected, AI skeptics were all over the blogosphere, saying, I told you so, LLMs are just massive exercises in pattern matching, and cannot extrapolate outside of their training set. This has massive implications for what we can expect in the near or intermediate future. Among other things, the optimism about AI progress is largely what is fueling the stock market, and also capital investment in this area: Companies like Meta and Google are spending ginormous sums trying to develop artificial “general” intelligence, paying for ginormous amounts of compute power, with those dollars flowing to firms like Microsoft and Amazon building out data centers and buying chips from Nvidia. If the AGI emperor has no clothes, all this spending might come to a screeching crashing halt.

Ars Technica published a fairly balanced account of the controversy, concluding that, “Even elaborate pattern-matching machines can be useful in performing labor-saving tasks for the people that use them… especially for coding and brainstorming and writing.”

Comments on this article included one like:

LLMs do not even know what the task is, all it knows is statistical relationships between words.   I feel like I am going insane. An entire industry’s worth of engineers and scientists are desperate to convince themselves a fancy Markov chain trained on all known human texts is actually thinking through problems and not just rolling the dice on what words it can link together.

And

if we equate combinatorial play and pattern matching with genuinely “generative/general” intelligence, then we’re missing a key fact here. What’s missing from all the LLM hubris and enthusiasm is a reflexive consciousness of the limits of language, of the aspects of experience that exceed its reach and are also, paradoxically, the source of its actual innovations. [This is profound, he means that mere words, even billions of them, cannot capture some key aspects of human experience]

However, the AI bulls have mounted various come-backs to the Apple paper. The most effective I know of so far was published by Alex Lawsen, a researcher at LLM firm Open Philanthropy. Lawsen’s rebuttal, titled “The Illusion of the Illusion of Thinking,  was summarized by Marcus Mendes. To summarize the summary, Lawsen claimed that the models did not in general “collapse” in some crazy way. Rather, the models in many cases recognized that they would not be able to solve the puzzles given the constraints input by the Apple researchers. Therefore, they (rather intelligently) did not try to waste compute power by grinding away to a necessarily incomplete solution, but just stopped. Lawsen further showed that the ways Apple ran the LRM models did not allow them to perform as well as they could. When he made a modest, reasonable change in the operation of the LRMs,

Models like Claude, Gemini, and OpenAI’s o3 had no trouble producing algorithmically correct solutions for 15-disk Hanoi problems, far beyond the complexity where Apple reported zero success.

Lawsen’s conclusion: When you remove artificial output constraints, LRMs seem perfectly capable of reasoning about high-complexity tasks. At least in terms of algorithm generation.

And so, the great debate over the prospects of artificial general intelligence will continue.

After the Fall: What Next for Nvidia and AI, In the Light of DeepSeek

Anyone not living under a rock the last two weeks has heard of DeepSeek, the cheap Chinese knock-off of ChatGPT that was supposedly trained using much lower resources that most American Artificial Intelligence efforts have been using. The bearish narrative flowing from this is that AI users will be able to get along with far fewer of Nvidia’s expensive, powerful chips, and so Nvidia sales and profit margins will sag.

The stock market seems to be agreeing with this story. The Nvidia share price crashed with a mighty crash last Monday, and it has continued to trend downward since then, with plenty of zig-zags.

I am not an expert in this area, but have done a bit of reading. There seems to be an emerging consensus that DeepSeek got to where it got to largely by using what was already developed by ChatGPT and similar prior models. For this and other reasons, the claim for fantastic savings in model training has been largely discounted. DeepSeek did do a nice job making use of limited chip resources, but those advances will be incorporated into everyone else’s models now.

Concerns remain regarding built-in bias and censorship to support the Chinese communist government’s point of view, and regarding the safety of user data kept on servers in China. Even apart from nefarious purposes for collecting user data, ChatGPT has apparently been very sloppy in protecting user information:

Wiz Research has identified a publicly accessible ClickHouse database belonging to DeepSeek, which allows full control over database operations, including the ability to access internal data. The exposure includes over a million lines of log streams containing chat history, secret keys, backend details, and other highly sensitive information.

Shifting focus to Nvidia – – my take is that DeepSeek will have little impact on its sales. The bullish narrative is that the more efficient algos developed by DeepSeek will enable more players to enter the AI arena.

The big power users like Meta and Amazon and Google have moved beyond limited chatbots like ChatGPT or DeepSeek. They are aiming beyond “AI” to “AGI” (Artificial General Intelligence), that matches or surpasses human cognitive capabilities across a wide range of cognitive tasks. Zuck plans to replace mid-level software engineers at Meta with code-bots before the year is out.

For AGI they will still need gobs of high-end chips, and these companies show no signs of throttling back their efforts. Nvidia remains sold out through the end of 2025. I suspect that when the company reports earnings on Feb 26, it will continue to demonstrate high profits and project high earnings growth.

Its price to earnings is higher than its peers, but that appears to be justified by its earnings growth. For a growth stock, a key metric is price/earnings-growth (PEG), and by that standard, Nvidia looks downright cheap:

Source: Marc Gerstein on Seeking Alpha

How the fickle market will react to these realities, I have no idea.

The high volatility in the stock makes for high options premiums. I have been selling puts and covered calls to capture roughly 20% yields, at the expense of missing out on any rise in share price from here.

Disclaimer: Nothing here should be considered as advice to buy or sell any security.

Latest from Leopold on AGI

When I give talks about AI, I often present my own research on ChatGPT muffing academic references. By the end I make sure that I present some evidence of how good ChatGPT can be, to make sure the audience walks away with the correct overall impression of where technology is heading. On the topic of rapid advances in LLMs, interesting new claims from a person on the inside can by found from Leopold Aschenbrenner in his new article (book?) called “Situational Awareness.”
https://situational-awareness.ai/
PDF: https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf

He argues that AGI is near and LLMs will surpass the smartest humans soon.

AI progress won’t stop at human-level. Hundreds of millions of AGIs could automate AI research, compressing a decade of algorithmic progress (5+ OOMs) into ≤1 year. We would rapidly go from human-level to vastly superhuman AI systems. The power—and the peril—of superintelligence would be dramatic.

Based on this assumption that AIs will surpass humans soon, he draws conclusions for national security and how we should conduct AI research. (No, I have not read all if it.)

I dropped in that question and I’m not sure if anyone has, per se, an answer.

You can also get the talking version of Leopold’s paper in his podcast with Dwarkesh.

I’m also not sure if anyone is going to answer this one:

I might offer to contract out my services in the future based on my human instincts shaped by growing up on internet culture (i.e. I know when they are joking) and having an acute sense of irony. How is Artificial General Irony coming along?