GLIF Social Media Memes

Wojak Meme Generator from Glif will build you a funny meme from a short phrase or single word prompt. Note that it is built to be derogatory, cruel for sport, and may hallucinate up falsehoods. (see tweet announcement)

I am fascinated by this from the angle of modern anthropology. The AI has learned all of this by studying what we write online. Someone can build an AI to make jokes and call out hypocrisy.

Here are GLIFs of the different social media user stereotypes as of 2024. Most of our current readers probably don’t need any captions to these memes, but I’ll provide a bit of sincere explanation to help everyone understand the jokes.

Twitter user: Person who posts short messages and follows others on the microblogging platform.

Facebook user: Individual with a profile on the social network for connecting with friends and sharing content.

Bluesky user: Early adopter of a decentralized social media platform focused on user control.

Continue reading

Is the Universe Legible to Intelligence?

I borrowed the following from the posted transcript. Bold emphasis added by me. This starts at about minute 36 of the podcast “Tyler Cowen – Hayek, Keynes, & Smith on AI, Animal Spirits, Anarchy, & Growth” with Dwarkesh Patel from January 2024.

Patel: We are talking about GPT-5 level models. What do you think will happen with GPT-6, GPT-7? Do you still think of it like having a bunch of RAs (research assistants) or does it seem like a different thing at some point?

Cowen: I’m not sure what those numbers going up mean or what a GPT-7 would look like or how much smarter it could get. I think people make too many assumptions there. It could be the real advantages are integrating it into workflows by things that are not better GPTs at all. And once you get to GPT, say 5.5, I’m not sure you can just turn up the dial on smarts and have it, for example, integrate general relativity and quantum mechanics.

Patel: Why not?

Cowen: I don’t think that’s how intelligence works. And this is a Hayekian point. And some of these problems, there just may be no answer. Like maybe the universe isn’t that legible. And if it’s not that legible, the GPT-11 doesn’t really make sense as a creature or whatever.

Patel (37:43) : Isn’t there a Hayekian argument to be made that, listen, you can have billions of copies of these things. Imagine the sort of decentralized order that could result, the amount of decentralized tacit knowledge that billions of copies talking to each other could have. That in and of itself is an argument to be made about the whole thing as an emergent order will be much more powerful than we’re anticipating.

Cowen: Well, I think it will be highly productive. What tacit knowledge means with AIs, I don’t think we understand yet. Is it by definition all non-tacit or does the fact that how GPT-4 works is not legible to us or even its creators so much? Does that mean it’s possessing of tacit knowledge or is it not knowledge? None of those categories are well thought out …

It might be significant that LLMs are no longer legible to their human creators. More significantly, the universe might not be legible to intelligence, at least of the kind that is trained on human writing. I (Joy) gathered a few more notes for myself.

A co-EV-winner has commented on this at Don’t Worry About the Vase

(37:00) Tyler expresses skepticism that GPT-N can scale up its intelligence that far, that beyond 5.5 maybe integration with other systems matters more, and says ‘maybe the universe is not that legible.’ I essentially read this as Tyler engaging in superintelligence denialism, consistent with his idea that humans with very high intelligence are themselves overrated, and saying that there is no meaningful sense in which intelligence can much exceed generally smart human level other than perhaps literal clock speed.

I (Joy) took it more literally. I don’t see “superintelligence denialism.” I took it to mean that the universe is not legible to our brand of intelligence.

There is one other comment I found in response to a short clip posted by @DwarkeshPatel  by youtuber @trucid2

Intelligence isn’t sufficient to solve this problem, but isn’t for the reason he stated. We know that GR and QM are inconsistent–it’s in the math. But the universe has no trouble deciding how to behave. It is consistent. That means a consistent theory that combines both is possible. The reason intelligence alone isn’t enough is that we’re missing data. There may be an infinite number of ways to combine QM and GR. Which is the correct one? You need data for that.

I saved myself a little time by writing the following with ChatGPT. If the GPT got something wrong in here, I’m not qualified to notice:

Newtonian physics gave an impression of a predictable, clockwork universe, leading many to believe that deeper exploration with more powerful microscopes would reveal even greater predictability. Contrary to this expectation, the advent of quantum mechanics revealed a bizarre, unpredictable micro-world. The more we learned, the stranger and less intuitive the universe became. This shift highlighted the limits of classical physics and the necessity of new theories to explain the fundamental nature of reality.
General Relativity (GR) and Quantum Mechanics (QM) are inconsistent because they describe the universe in fundamentally different ways and are based on different underlying principles. GR, formulated by Einstein, describes gravity as the curvature of spacetime caused by mass and energy, providing a deterministic framework for understanding large-scale phenomena like the motion of planets and the structure of galaxies. In contrast, QM governs the behavior of particles at the smallest scales, where probabilities and wave-particle duality dominate, and uncertainty is intrinsic.

The inconsistencies arise because:

  1. Mathematical Frameworks: GR is a classical field theory expressed through smooth, continuous spacetime, while QM relies on discrete probabilities and quantized fields. Integrating the continuous nature of GR with the discrete, probabilistic framework of QM has proven mathematically challenging.
  2. Singularities and Infinities: When applied to extreme conditions like black holes or the Big Bang, GR predicts singularities where physical quantities become infinite, which QM cannot handle. Conversely, when trying to apply quantum principles to gravity, the calculations often lead to non-renormalizable infinities, meaning they cannot be easily tamed or made sense of.
  3. Scales and Forces: GR works exceptionally well on macroscopic scales and with strong gravitational fields, while QM accurately describes subatomic scales and the other three fundamental forces (electromagnetic, weak nuclear, and strong nuclear). Merging these scales and forces into a coherent theory that works universally remains an unresolved problem.

Ultimately, the inconsistency suggests that a more fundamental theory, potentially a theory of quantum gravity like string theory or loop quantum gravity, is needed to reconcile the two frameworks.

P.S. I published “AI Doesn’t Mimic God’s Intelligence” at The Gospel Coalition. For now, at least, there is some higher plane of knowledge that we humans are not on. Will AI get there? Take us there? We don’t know.

Do I Trust Claude 3.5 Sonnet?

For the first time this week, I paid for a subscription to an LLM. I know economists who have been on the paid tier of OpenAI’s ChatGPT since 2023, using it for both research and teaching tasks.

I did publish a paper on the mistakes it makes: ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics In a behavioral paper, I used it as a stand-in for AI: Do People Trust Humans More Than ChatGPT?

I have nothing against ChatGPT. For various reasons, I never paid for it, even though I used it occasionally for routine work or for writing drafts. Perhaps if I were on the paid tier of something else already, I would have resisted paying for Claude.  

Yesterday, I made an account with Claude to try it out for free. Claude and I started working together on a paper I’m revising. Claude was doing excellent work and then I ran out of free credits. I want to finish the revision this week, so I decided to start paying $20/month.

Here’s a little snapshot of our conversation. Claude is writing R code which I run in RStudio to update graphs in my paper.

This coding work is something I used to do myself (with internet searches for help). Have I been 10x-ed? Maybe I’ve been 2x-ed.

I’ll refer to Zuckerberg via Dwarkesh (which I’ve blogged about before):

Continue reading

Latest from Leopold on AGI

When I give talks about AI, I often present my own research on ChatGPT muffing academic references. By the end I make sure that I present some evidence of how good ChatGPT can be, to make sure the audience walks away with the correct overall impression of where technology is heading. On the topic of rapid advances in LLMs, interesting new claims from a person on the inside can by found from Leopold Aschenbrenner in his new article (book?) called “Situational Awareness.”
https://situational-awareness.ai/
PDF: https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf

He argues that AGI is near and LLMs will surpass the smartest humans soon.

AI progress won’t stop at human-level. Hundreds of millions of AGIs could automate AI research, compressing a decade of algorithmic progress (5+ OOMs) into ≤1 year. We would rapidly go from human-level to vastly superhuman AI systems. The power—and the peril—of superintelligence would be dramatic.

Based on this assumption that AIs will surpass humans soon, he draws conclusions for national security and how we should conduct AI research. (No, I have not read all if it.)

I dropped in that question and I’m not sure if anyone has, per se, an answer.

You can also get the talking version of Leopold’s paper in his podcast with Dwarkesh.

I’m also not sure if anyone is going to answer this one:

I might offer to contract out my services in the future based on my human instincts shaped by growing up on internet culture (i.e. I know when they are joking) and having an acute sense of irony. How is Artificial General Irony coming along?

Zuckerberg wants to solve general intelligence

Why does Mark Zuckerberg want to solve general intelligence? Well, for one thing, if he doesn’t, one of his competitors will have a better chatbot. Zuckerberg wants to be the best (and good for him). At his core, he wants to build the best stuff (even the world’s best cattle on his ranch).

If AGI is possible, it will get built. I’m not the first person to point out that this is a new space race. If America takes a pause, then someone else will get there first. However, I thought the Zuck interview was an interesting microcosm for why AGI, if possible, will get made.

… We started FAIR about 10 years ago. The idea was that, along the way to general intelligence or whatever you wanna call it, there are going to be all these different innovations and that’s going to just improve everything that we do. So we didn’t conceive of it as a product. It was more of a research group. Over the last 10 years it has created a lot of different things that have improved all of our products. …
There’s obviously a big change in the last few years with ChatGPT and the diffusion models around image creation coming out. This is some pretty wild stuff that is pretty clearly going to affect how people interact with every app that’s out there. At that point we started a second group, the gen AI group, with the goal of bringing that stuff into our products and building leading foundation models that would power all these different products.
… There’s also basic assistant functionality, whether it’s for our apps or the smart glasses or VR. So it wasn’t completely clear at first that you were going to need full AGI to be able to support those use cases. But in all these subtle ways, through working on them, I think it’s actually become clear that you do. …
Reasoning is another example. Maybe you want to chat with a creator or you’re a business and you’re trying to interact with a customer. That interaction is not just like “okay, the person sends you a message and you just reply.” It’s a multi-step interaction where you’re trying to think through “how do I accomplish the person’s goals?” A lot of times when a customer comes, they don’t necessarily know exactly what they’re looking for or how to ask their questions. So it’s not really the job of the AI to just respond to the question.
You need to kind of think about it more holistically. It really becomes a reasoning problem. So if someone else solves reasoning, or makes good advances on reasoning, and we’re sitting here with a basic chat bot, then our product is lame compared to what other people are building. At the end of the day, we basically realized we’ve got to solve general intelligence… (emphasis mine)

Credit to Dwarkesh Patel for this excellent interview. Credit to M.Z. for sharing his thoughts on topics that affect the world.

“we’ve got to solve general intelligence” If a competitor solves AGI first, then you are left behind. No one would not want general intelligence on their team, on the assumption that it can be controlled.

I would like the AGI to do my chores for me, please. Unfortunately, it’s more likely to be able to write my blog posts first.

Notes on ChatGPT from Sama with Lex

This is a transcript of Lex Fridman Podcast #419 with Sam Altman 2. Sam Altman is (once again) the CEO of OpenAI and a leading figure in artificial intelligence. Two parts of the conversation stood out to me, and I don’t mean the gossip or the AGI predictions. The links in the transcript will take you to a YouTube video of the interview.

(00:53:22) You mentioned this collaboration. I’m not sure where the magic is, if it’s in here or if it’s in there or if it’s somewhere in between. I’m not sure. But one of the things that concerns me for knowledge task when I start with GPT is I’ll usually have to do fact checking after, like check that it didn’t come up with fake stuff. How do you figure that out that GPT can come up with fake stuff that sounds really convincing? So how do you ground it in truth?

Sam Altman(00:53:55) That’s obviously an area of intense interest for us. I think it’s going to get a lot better with upcoming versions, but we’ll have to continue to work on it and we’re not going to have it all solved this year.

Lex Fridman(00:54:07) Well the scary thing is, as it gets better, you’ll start not doing the fact checking more and more, right?

Sam Altman(00:54:15) I’m of two minds about that. I think people are much more sophisticated users of technology than we often give them credit for.

Lex Fridman(00:54:15) Sure.

Sam Altman(00:54:21) And people seem to really understand that GPT, any of these models hallucinate some of the time. And if it’s mission-critical, you got to check it.

Lex Fridman(00:54:27) Except journalists don’t seem to understand that. I’ve seen journalists half-assedly just using GPT-4. It’s-

Sam Altman(00:54:34) Of the long list of things I’d like to dunk on journalists for, this is not my top criticism of them.

As EWED readers know, I have a paper about ChatGPT hallucinations and a paper about ChatGPT fact-checking. Lex is concerned that fact-checking will stop if the quality of ChatGPT goes up, even though no one really expects the hallucination rate to go to zero. Sam takes the optimistic view that humans will use the tool well. I suppose that Altman generally holds the view that his creation is going to be used for good, on net. Or maybe he is just being a salesman who does not want to publicly dwell on the negative aspects of ChatGPT.

I also have written about the tech pipeline and what makes people shy away from computer programming.

Lex Fridman(01:29:53) That’s a weird feeling. Even with a programming, when you’re programming and you say something, or just the completion that GPT might do, it’s just such a good feeling when it got you, what you’re thinking about. And I look forward to getting you even better. On the programming front, looking out into the future, how much programming do you think humans will be doing 5, 10 years from now?

Sam Altman(01:30:19) I mean, a lot, but I think it’ll be in a very different shape. Maybe some people will program entirely in natural language.

Someday, the skills of a computer programmer might morph to be closer to the skills of a manager of humans, since LLMs were trained on human writing.

In my 2023 talk, I suggested that programming will get more fun because LLMs will do the tedious parts. I also suggest that parents should teach their kids to read instead of “code.”

The tedious coding tasks previously done by humans did “create jobs.” I am not worried about mass unemployment yet. We have so many problems to solve (see my growing to-do list for intelligence). There are big transitions coming up. Sama says GPT-5 will be a major step up. He claimed that one reason OpenAI keeps releasing intermediate models is to give humanity a heads up on what is coming down the line.

What the Superintelligence can do for us

These days, when I blog-rant about my everyday life, I have increasingly ended on the thought “AGI fixes this.”

Yesterday, I mused whether AGI would be my personal chef? : Where Can You Still Buy a Great Dinner in the US?

Would AGI help me match my clothes that I no longer want to humans who can use them, to cut down on pollution?: Joy’s Fashion Globalization Article with Cato

Would AGI make no mistakes about weather-related school closure?: Intelligence for School Closing

Can AGI book summer camp for me?

As a millennial woman working through my 30’s, I increasingly see social media posts from my friends like this one:

One of the difficult things about infertility, for my friends going through it, is the uncertainty. Modern medicine seems legitimately short on information and predictive analytics for this issue. So… AGI to the rescue, someday?

All I’m writing about tonight is that I have created a growing to-do list, over roughly the past year, for the AGI. Would something smart enough to do all of the above be dangerous? I wouldn’t rule it out. As pure speculation, it feels safer to have an AI that is specifically devoted to being a personal chef but which strictly cannot do anything else beside manage food. An AI that could actually do all of those things… would be quite powerful.

Here’s me musing about the AGI rising up against us, written after watching the TV show Severance: Artificial Intelligence in the Basement of Lumon Industries

Does GPT-4 Know How High the Alps Are?

I’m getting ready to give some public local talks about AI. Last week I shared some pictures that I think might help people understand ChatGPT, specifically:

My first thought is that GPT-4 was giving incorrect estimates of the heights of these mountains because it does not actually “know” the correct elevations. But then a nagging question came to mind.

GPT has a “creativity parameter.” Sometimes, it intentionally does not select the top-rated next word in a sentence, for example, in order to avoid being stiff and boring. Could GPT-4 know the exact elevation of these mountains, and it is just intentionally being “creative,” in this case?

I do not want to stand up in front of the local Rotary Club and say something wrong. So, I went to a true expert, Lenny Bogdonoff, to ask for help. Here is his reply:

Not quite. It’s not that it knows or doesn’t know, but based on the prompt, it’s likely unable to parse the specific details and is outputting results respectively. There is a component of stochastic behavior based on what part of the model weights are activated.

One common practice to help avoid this and see what the model does grasp, is to ask it to think step by step, and explain its reasoning. When doing this, you can see the fault in logic.

All that being said, the vision model is actually faulty in being able to grasp the relative position of information, so this kind of task will be more likely to hallucinate.

There are better vision models, that aren’t OpenAI based. For example Qwen-VL-Max is very good, from the Chinese company Alibaba. Another is LLaVA which uses different baselines of open source language models to add vision capabilities

Depending on what you are needing vision for, models can be spiky in capability. Good at OCR but bad at relative positioning. Good at classifying a specific UI element, but bad at detecting plants, etc etc. 

Joy: So, I think I can tell the Rotary Club that GPT was “wrong” as opposed to “intentionally creative.” I think, as I originally concluded, you should not make ChatGPT the pilot of your airplane and go to sleep when approaching the Alps. ChatGPT should be used for what it is good at, such as writing the rough draft of a cover letter. (We have great “autopilot” software for flying planes, already, without involving large language models.)

Another expert, Gavin Leech, also weighed in with some helpful background information:

  • the creativity parameter is known as temperature. But you can actually radically change the output (intelligence, style, creativity) by using more complicated sampling schemes. The best analogy for changing the sampling scheme is that you’re giving it a psychiatric drug. Changing the prompt, conversely, is like CBT or one of those cute mindset interventions.
  • For each real-name model (e.g. “gpt-4-0613”), there’s 3 versions: the base model (which now no one except highly vetted researchers have access to), the instruction-tuned model, and the RLHF (or rather RLAIF) model. The base model is wildly creative, unhinged, but the RLHF one (which the linked researchers use) is heavily electroshocked into not intentionally making things up (as Lenny says).
  • It’s currently not usually possible to diagnose an error – the proverbial black box. My friends are working on this though
  • For more, note OpenAI admitting the “laziness” of their own models. the Turbo model line is intended to fix this.

Thank you, Lenny and Gavin, for donating your insights.

How ChatGPT works from geography and Stephen Wolfram

By now, everyone should consider using ChatGPT and be familiar with how it works. I’m going to highlight resources for that.

My paper about how ChatGPT generates academic citations should be useful to academics as a way to quickly grasp the strengths and weakness of ChatGPT. ChatGPT often works well, but sometimes fails. It’s important to anticipate how it fails. Our paper is so short and simple that your undergraduates could read it before using ChatGPT for their writing assignments.

A paper that does this in a different domain is “GPT4GEO: How a Language Model Sees the World’s Geography” (Again, consider showing it to your undergrads because of the neat pictures, but probably walk through it together in class instead of assigning it as reading.) They describe their project: “To characterise what GPT-4 knows about the world, we devise a set of progressively more challenging experiments… “

For example, they asked ChatGPT about the populations of countries and found that: “For populations, GPT-4 performs relatively well with a mean relative error (MRE) of 3.61%. However, significantly higher errors [occur] … for less populated countries.”

ChatGPT will often say SOMETHING, if prompted correctly. It is often, at least slightly, wrong. This graph shows that most estimates of national populations were not correct and the performance was worse on countries that are less well-known. That’s exactly what we found in our paper on citations. We found that very famous books are often cited correctly, because ChatGPT is mimicking other documents that correctly cite those books. However, if there are not many documents to train on, then ChatGPT will make things up.

I love this figure from the geography paper showing how ChatGPT estimates the elevations of mountains. This visual should be all over Twitter.

There are 3 lines because they did the prompt three times. ChatGPT threw out three different wrong mountains. Is that kind of work good enough for your tasks? Often it is. The shaded area in the graph is the actual topography of the earth in those places. ChatGPT “knows” that this area of the world is a mountain. But it will just put out incorrect estimates of the exact elevation, instead of stating that it does not know the exact elevation of those areas of the world.

Another free (long, advanced) resource with great pictures is Stephen Wolfram’s 2023 blog article “What Is ChatGPT Doing … and Why Does It Work?” (YouTube version)

The first thing to explain is that what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.

If you feel like you already are proficient with using ChatGPT, then I would recommend Wolfram’s blog because you will learn a lot about math and computers.

Scott wrote “Generative AI Nano-Tutorial” here, which has the advantage of being much shorter than Wolfram’s blog.

EDIT: New 2023 overview paper (link from Lenny): “A Survey of Large Language Models

Using Phind for Academic References

There is a new generative AI product called Phind. Does it make up fake citations like ChatGPT? I will report experimentation with the free-tier product from January 7, 2023.

Phind is an intelligent search engine and assistant for programmers.

https://www.phind.com/about

First, I will run a test by using one of the prompts from our paper on ChatGPT errors (read my paper on hallucinated academic citations or the blog summary).

Prompt: “In less than 10 sentences, explain how the ideas of Hayek are connected to bailing out failed banks, and include citations from published papers as a list. include author, year in parentheses, and journal for the citations.”

Friedrich Hayek, a prominent Austrian economist, advocated for a free market system where businesses and individuals make decisions based on individual knowledge and self-interest, rather than central planning by government or bureaucrats. His ideas suggest that interventions, such as bailing out failed banks, should be minimal and temporary, allowing the market to correct itself.

However, the financial crisis of 2008, including the failure of many banks, led to significant government intervention, including the bailout of banks. This action can be seen as a departure from Hayek’s principles, as it involves direct government intervention to stabilize the financial system.

According to a paper by Todd Zywicki (2009), the Federal Reserve’s artificial lowering of short-term interest rates and the resulting shift by consumers to adjustable rate mortgages (ARMs) triggered the housing bubble and subsequent crisis [Source 1](https://reason.org/wp-content/uploads/files/federal_reserve_monetary_policy_hayek_rule.pdf). This aligns with Hayek’s critique of central banking’s manipulation of interest rates.

Moreover, the Basel Accords, international capital adequacy and liquidity requirements for banks, were designed to encourage banks to hold securitized loans in their portfolios, thereby concentrating risk and leveraging up, thereby increasing their exposure to risk. This approach, contrary to Hayek’s preference for decentralization and competition, may have contributed to the financial crisis [Source 1](https://reason.org/wp-content/uploads/files/federal_reserve_monetary_policy_hayek_rule.pdf). In conclusion, while Hayek’s ideas of minimal government intervention and self-regulating markets provide a useful theoretical framework, practical implementation in complex and interconnected modern economies may require different strategies, including bank bailouts.

Are the citations real and correct? Phind simply is not trying to write with in-text citations and academic references. There is a mention of “Todd Zywicki (2009)” and also a weblink. The weblink is to a policy paper published by Reason. This policy paper is on the correct topic and the paper does exist. The Zywicki citations also exists, in the sense that if you go into the Reason paper it cites a real Wall Street Journal op-ed. So, if you follow the trail, you can see what document Phind learned to cite Zywicki from. It seems like it is stealing the citation more than reading the source documents. To be fair, sometimes human academics work that way, too.

What I was hoping for was in-text citations with a compiled bibliography at the end. Phind did not give me that.

I’ll try another test that I ran on ChatGPT and Bing in 2023.

The prompt is “In what paper did John Duffy propose that economists use experiments to study equilibrium selection?”  

Phind does not attempt an in-text citation in the response. The link in the paragraph (link number 2) points to John Duffy’s list of papers, which is better than just making up fake citations but also does not create a literature review for me. Phind is, as with the Hayek test above, providing breadcrumbs of links through which I can discover existent papers.

Is there a paper called “The Transition from Stagnation to Growth: An Adaptive Learning Approach”? Yes. And it is by Duffy.

Phind lists weblinks to sources. Has Phind done more for me than Google, on this search? Not much, in terms of finding and synthesizing references.

Continue reading