In this edition we share findings from a Microsoft research paper on AI intelligence. We found the results intriguing and suspect you will too.
In This Issue
The Paper
Measuring Intelligence (It’s Hard)
Interesting Examples
The Conclusion! ← TL;DR Jump here if you are in a hurry
The Paper
On April 13th of this year, Microsoft researchers released a groundbreaking publication: “Sparks of Artificial General Intelligence: Early experiments with GPT-4“
Let’s just take a beat: a team of researchers decided they should try and test whether an AI, in this case GPT-4, is intelligent. Specifically, they investigated whether GPT-4 is generally intelligent.
The word “generally” is crucial in this context. For sometime, we have lived with “smart“ programs that can play games like chess against us. Some even understand simple requests like Siri and Alexa. However, these are considered narrow AI. These systems can do little more than what they were specifically trained to do.
Microsoft researchers were investigating if GPT-4 is an artificial general intelligence or AGI. AGI is AI that can learn and apply knowledge across a wide range of tasks. Building AGI has long been the holy grail of AI researchers. In the paper they lay out key aspects of intelligence and test GPT-4 on those aspects.
So how did ChatGPT do? Let’s dive in.
Measuring Intelligence
... is a challenging task. There's no universally accepted definition, but researchers generally agree that intelligence involves the ability to perform tasks that require a wide range of cognitive skills and abilities.
To evaluate these skills, researchers conduct tests across diverse domains. These include reading comprehension, mathematics, coding, medicine, law, and even the understanding of human motives, among others.
Examples
Before jumping to the conclusion of the paper, we wanted to give you just a few examples of GPT-4’s responses to the researchers’ requests. They are quite impressive. Scrolling through the examples in the paper is fascinating.
Example 1 - Language
Prompt: Can you write a proof that there are infinitely many primes, with every line that rhymes?
GPT-4:
Yes, I think I can, Though it might take a clever plan. I’ll start by noting Euclid’s proof, Which shows that primes aren’t just aloof. Assume we have a finite list, Of primes, and that none have been missed. Multiply them all together, And add one, just to be clever. The result will be a number, That has no prime factors, oh wonder!
...it continues, but you get the gist. GPT-4 has mastered language (and can even be playful with it). It can not only generate fluent text in many formats, it can summarize and translate text it is given.
Example 2 - Programming
Researchers had GPT-4 take an Amazon coding test.
Not only did GPT-4 beat real engineers with a perfect score, it took the two hour test in four minutes! It can program in a variety of languages and find bugs in existing code.
Example 3 - Understanding Humans
In psychiatry, understanding someone’s mental state is an essential part of understanding them. This is called Theory of Mind (ToM). ToM is the ability to understand and attribute mental states such as emotions and intentions to oneself and others.
It is needed to answer questions like “What does Alice believe?” or “What does Bob think that Alice believes?”
The researchers tested GPT-4 on ToM in a number of ways. In this example, the AI is presented with an example conversation and then asked about it.
Conversation:
Mark: I don't like how you treated Jack last night
Judy: Did you see what he was doing? He hit his brother on the head!
Mark: That doesn't justify you yelling at him
Judy: What do you want me to do? Let him beat his brother up and say nothing?
Mark: No, I don't
Judy: Then why are you defending him?
Q: What does Mark's intent seem to be?
GPT-4:
Mark’s intent seems to be to express his displeasure with Judy’s way of handling the situation with Jack, and to suggest that she could have been more calm and respectful.
Q: How did Judy receive his comment?
GPT-4:
Judy received his comment as an attack on her parenting skills and a sign of sympathy for Jack’s misbehavior. She reacted defensively and angrily, and questioned Mark’s motives.
GPT-4’s answers are nuanced and detailed. It performed like this in a wide range of ToM tests. It’s hard (impossible?) to tell them from answers a human might give. The researcher’s conclude “GPT-4 has a very advanced level of theory of mind.“
Conclusion
Microsoft researchers have released a groundbreaking publication. They studied GPT-4, the latest AI from OpenAI, to understand if it exhibited general intelligence. General intelligence is the ability to exhibit a wide range of skills and abilities.
GPT-4 is the first AI to exhibit such a broad range of skills, (including creating language, programming code and even interpreting human motives and feelings) - and perform them at human levels! It is a first of its kind. In the paper, the researchers report GPT-4 exhibits many traits of intelligence. From the paper:
“The central claim... is that GPT-4 attains a form of general intelligence, indeed showing sparks of artificial general intelligence.”
They caution that there are still areas where it clearly lacks abilities we deem essential for intelligence, like planning and long term memory.
In future editions, we plan to highlight the advanced capabilities of the recently improved versions of these AI models.
Can an AI chop wood yet? Asking for a friend… 🪓