Exploring the “ultimate” problem

What is “intelligence”?

Intelligence is an interesting word and its definition is something we all struggle to agree on, especially the experts. Even though we’re unable to agree on a single definition for “intelligence” we know it’s possibly the most important feature of humanity… It’s what separates us from every other species on this planet.

Our intelligence has given rise to language, agriculture, the industrial revolution, space exploration, and a globally interconnected consciousness (e.g. the internet). With all these advancements humanities made, there’s still one problem, the “ultimate” problem that we’ve struggled to solve. The solution to this problem could be the solution to all our current and future problems… It’s the “root” and if we’re able to solve the “root” cause of all our problems then who knows what utopian world we’ll reach?

That problem is artificially re-creating the superpower of “intelligence” we humans have.

Before we go deeper into why re-creating “intelligence” is the “ultimate” problem, let’s first take a step back and see where we are today attempting to solve this “ultimate” problem.

Today’s approach – Reinforcement Learning

If you’ve read my previous post on “How Machines Think” you might know a bit about the three major areas of machine learning (e.g. Supervised, Unsupervised, and Reinforcement Learning)… But if not, no worries! I’m going to give a human-friendly summary of “Reinforcement Learning” (RL) here.

So… There has been a long line of super-smart people trying to solve this whole “intelligence” problem for many decades and with all of these attempts, we’re edging closer to something that looks like intelligence. There’s a fine line we’ll need to draw before diving into today’s cutting-edge approaches and that’s narrow vs. general artificial intelligence.

In a nutshell, “Narrow AI” is your calculator. Meaning, your calculator can do a really amazing job at an extremely narrow task (e.g. addition, multiplication, etc.).

The “ultimate” intelligence we’re aiming to reach is “General AI”. This type of intelligence will be able to “generalize” across many different tasks, such as riding a bike, solving hard physics problems, and being able to comfort you when you’ve had a sh** day.

There are many different approaches to solving this “ultimate” problem, but the one that’s currently getting the most attention is called “Reinforcement Learning” (RL).

I’m assuming the first thought most of you had when seeing that term was… What is the f*** does that even mean?!

The idea behind RL is actually pretty simple, I’ll explain it through this game you and I play called “Life”.

We, humans, go about living our lives in a simple way, even though we do our best to complicate things… This simple game of life is centered around doing “good” for ourselves and those around us, whatever “good” means for you. In our world, we get constant feedback on how we’re doing (e.g. money, status, acknowledgment, etc.), this feedback redirects our focus towards positive feedback and away from negative feedback. This life loop we’re circling through to get more positive and less negative feedback is RL.

In a simple summary, RL is when a “person” does some kind of “action” to reach a “goal” and their “environment” gives them feedback adjusting their future actions, so they’re able to get more “rewards” (or less punishment).

You might think that RL looks almost identical to what we call life and that’s because it is… These smart techy folks are trying to mimic what we humans do, with the hope that this will lead to an artificially intelligent thing.

When you research “Reinforcement Learning” there’s a good chance you’ll hit a wall of jargon and math formulas. I’m going to humanize some of that jargon now, so you can avoid the feelings most of us experience when diving into the deep end of a topic too quickly (more here and here).

Basics

Agent – The thing that interacts with the world around it (you, me, a robot, etc.)
Environment – The physical or virtual world you live in
State – The current area you’re able to observe… Think of a state as being a smaller subset of the entire world you’re living in (e.g. me looking at my empty coffee cup right now is considered a “state” I’m in).
Action – The thing you do to interact with the environment around you (e.g. me standing up to put more coffee into your empty coffee cup)
Reward – The positive or negative feedback you get from the action you’ve taken in your environment (e.g. Me receiving a hit of caffeine from drinking more coffee)
Policy – A strategy an agent (e.g. you, me, or a robot) has when interacting with a specific state (e.g. empty coffee cup = me getting more coffee)

More if you’re interested

Discounted Rewards (or discount rate) – This math magic bakes in the assumption that more immediate rewards have a higher value than long-term rewards (e.g. you might die tomorrow)… It’s interesting how this sounds just like the short-term thinking we humans have around many things (e.g. Climate Change and corporate profits)
Model-based (less realistic) – Try to create a model of the world, so you know exactly which actions to take
Model-free (more realistic) – Learn experience, updating your policy (e.g. strategy) as you take actions throughout the environment.
Credit Assignment Problem – This is an unsolved problem to figure out which actions have had the largest impact on progressing you closer to your goals. There’s a major philosophical connection here around how we naively point to certain traits (e.g. hard work > luck) in a person when retrospectively reflecting on their success, but I’ll leave that rant for another time.
Exploration vs. Exploitation – Here’s another unsolved problem in RL attempting to figure out how much we should explore new environments or overexploit the rewards from environments we know about. For example, should we just throw more data at existing approaches we have for solving intelligence hoping we achieve “general AI” or throw away everything and start exploring new ways of creating “general AI”?
Deep Reinforcement Learning – This is what most cool kids are using to get closer to “general AI”, which is basically combining “deep neural networks” with RL… Achieving an end-to-end learning process (more here).
Multi-agent reinforcement learning (MARL) – Lastly, another approach we’re using to solve that “ultimate” problem… Which is basically instead of learning how to do something alone in a world, you’re learning alongside others (e.g. humans on planet Earth).

This is only the tip of the iceberg when it comes to the jargon used in RL, but hopefully, it’s enough to get you started.

The gap between today and tomorrow

Now that we have a general understanding of what RL is, let’s take a peek at the major gaps and why we’ve not reached “general AI”.

With all these amazing achievements in AI, you would assume we’re right around the corner from general AI, but the deeper I dig the more issues that pop up. With that said, it’s important to put this progress into perspective and realize we’re on what seems to be an exponential curve, so all these gaps could be bridged sooner than we think.

Below are just a few gaps I’ve stumbled across, but if you’re interested there’s more here, here, and here.

Lack of memory (e.g. catastrophic forgetting) – Similar to how we humans forget much about our past, machines tend to do the same thing when learning new skills. Imagine a machine learning how to play Chess, after learning Chess it decides to learn how to make a burrito, but while learning how to make a burrito it completely forgets how to do anything with Chess. Lack of memory is one of the many problems we’re tackling in the realm of RL.
We don’t know what we actually want (The Control Problem) – There was a book recently written by Stuart Russell named “Human Compatible AI” that runs through this issue, which isn’t really a technical issue for reaching “general AI”, but it’s a problem we should all consider. Let me explain… We, humans, tend to think we know what we actually want, but this is far from the truth. Picture a scenario where you ask an AI agent to solve Climate Change and this AI realizes that the reason for the increase in carbon is due to humans, so if it rids the planet of humanity, then problem solved… I bet that’s not what you were expecting. Ha! Our ability to have the machines learn what we “actually” want and not what we “think” we want is an important problem to solve.
Casual Confusion (correlation doesn’t = causation) – Incorrectly connecting some cause to an effect is a problem we all face as faulty humans and this is the case for machines as well. Here’s an interesting example with AI and a list of hilarious graphs showing how we humans easily create these fictional connections.

Generalization (e.g. transfer learning) – Being able to generalize between activities and learning multiple skills is probably the most common problem I’ve come across in my short dive into the world of RL. The meta-skill of learning how to learn is a massive hurdle, as well as an opportunity… All of our agents today are really good at a small subset of similar tasks, but once you move that agent outside of a familiar space it struggles to learn other skills.

As I mentioned in the beginning… These are only a few gaps we’ll need to bridge before reaching the ideal state of “general AI”, but there is a TON of brain and resource power behind each problem, so who knows how long it will take?

Why this is the “ULTIMATE” problem

When I think about the importance of our brain and the role it plays in our world, I can’t help, but come to the conclusion that it’s fundamentally the most important problem. Even though this beautiful sponge between our ears is incredibly powerful and efficient, it’s still riddled with a long list of cognitive biases that hurt, as well as help us deal with this complex world.

The number of inventions we’ve created with our intelligence is amazing and we continue to surprise ourselves, but there are larger problems that our human brains struggle to understand… This is where a “general AI” could possibly change humanity permanently.

A quote from I.J. Good sums this idea up really well…

Source

“the first ultra-intelligent machine is the last invention that man need ever make”

Imagine a machine that’s able to help us solve Climate Change, Space Colonization, Life Longevity, and many other things within hours, days, or weeks! I know this sounds Sci-Fi and it might be, but this is one of the main reasons you see so many smart and wealthy people throwing resources into solving this “ultimate” problem. It’s the “root” that connects everything else and the epicenter of all our problems.

Our brain is where it all starts and ends, so that’s why I feel that solving intelligence is the “ultimate” problem.

Resources for diving deeper into RL:

Simple intro into RL (part 5 of a blog series)
Good beginner overview video into RL
Charles Isbell is probably one of my new favorite teachers and I highly recommend this RL series… Also here’s a more technical presentation, but still great… And an intro video into his backstory.
A more in-depth lecture series at Berkley. This series is much more approachable compared to other RL lectures because the lecturer does a good job focusing on intuition over just math formulas.
Last, is a lengthy RL intro blog post, but it’s worth the effort… The author (Andrej Karpathy – AI Lead @ Tesla) has achieved some impressive stuff at a young age, but what’s more impressive is the way he communicates, not taking himself or the topic too seriously, which I really appreciate.