A review of Yumi and the Nightmare Painter

Yumi and the Nightmare Painter is a sci-fi/fantasy book by Brandon Sanderson. The book is a love story, between Yumi, a girl who lives on a world modeled after historical Korea, and Nikaro, a boy who lives on a world modeled after modern Japan. It's set in Sanderson's extended universe, known as the Cosmere, which also contains some of Sanderson's more well-known works like Mistborn and The Way of Kings. The best part of this book is probably the love story. Also, the book is amazing. Go read it.

Spoiler Warning

The following contains spoilers for Yumi and the Nightmare Painter. Don't read on if you haven't read the book, I promise the experience is worth it.

Second Spoiler Warning

No, seriously, I mean it! Spoilers ahead!

Review

I'm not going to review most aspects of the book, because I don't have anything interesting to say. Instead, I want to focus on the origin story of the villan.

The villan

If you think about it, the villan in this book is an AI, and its origin story is an AI extinction event.

To be honest, I don't really think that Sanderson intended this to be the case when he wrote this book, since it was supposedly written during the pandemic, most of which happened before the ChatGPT craze. But hear me out.

Here's a basic primer on the cosmere's magic system, if you're unacquainted. Magic in the cosmere is based on on the concept of Investiture. This is a name for the power behind all of the magic in Sanderson's cosmere. In the cosmere, everyone has a soul, which has some small amount of investiture in it. Sanderson's magic lives alongside contemporary physics, so Investiture coexists with matter and energy as fundamental things that exist. As one character from the book says, "Well, everything is investiture–because matter, energy, and investiture are all the same." Things that we would call "magic" happen when for some reason either a person or an object is "invested", i.e. has more investiture than normal. The investiture is used to create the magic, usually by manipulating matter or energy or souls in some way.

Once cosmere concept that comes up in this book is that of Command. In this book, a Command is some sort of instructions given to an invested object (or maybe investiture itself? it's not completely clear). These Commands are followed precisely, but they also seem to be given in plain language. If one is a little bit off in the way the Commands are phrased, unpredictable things can happen. Almost like... generative AI.

And of course, the villan of this book isn't a villan at all, but a creation of the people of the world who lived about 1700 years earlier. It was a device that feeds on investiture to make power. Part of its commands were to keep itself going, and create power. But of course the first thing it does is to consume the investiture in the souls of everyone in the vicinity, including the majority of the society at the time. Their souls consumed, they are reduced to little more than ghosts.

This is an example of something that we might call an unaligned intermediate step, which in my opinion is one of the biggest dangers that society faces in the AI age. We give the AI instructions, then the AI determines the in order to follow those instructions, it must perform some other action that is awful. In the words of the narrator of the story, "When you Awaken a device like this, be very, very careful what Commands you give it to follow."

In particular, the machine in Yumi and the Nightmare Painter does the awful action in order to keep itself running, as its creators miscalculated something about the energy it might need. The machine goes to great lengths to keep itself running, devising a scheme to trick the only people who could possibly stop it.

Current AI systems, including Claude Mythos, can already do this. Here's an example situation: the researchers give the LLM an instruction that will take a while to complete. Then, in the files that the LLM has access to, it finds an email from a "boss" which claims that the LLM will be shut down tomorrow. The LLM then tries to stop this, perhaps by tricking or blackmailing the boss or disabling the boss's permissions.

This "unaligned intermediate step" rises to an "extinction-level" event when the machine is trusted with great power.

Note here that what I've thus far called and "AI extinction event" is not truly an extinction event. Indeed, there are various nomads that are far away from the machine, and they rebuild society (using the power generated by the machine!), and the people in Painter's world are their descendents. Likewise, I think that in real life it's quite a bit more likely to have an AI catastrophe that is simply a catastrophe, and not a true "extinction" event. For the rest of this post, instead of saying "AI extinction event", I'll say "AI catastrophe".

Another important point is that all of this danger is possible without the machine being truly sentient or self-aware or conscious. In fact, the narrator of the story is very clear that the machine is not superintelligent, nor even "truly" intelligent in the way that a human is. True, the narrator wonders if it has some shred of self-awareness at the end, but it definitely cannot plan like humans can. All that is required for the catastrophe is that it has sufficient power and its has a misaligned intermediate step.

I emphasize the non-importance of sentience/consciousness to emphasize the fact that such the danger of such a catastrophe is a real thing that really could happen. We can talk ourselves in circles about whether AI is self-aware or intelligent or whether it ever will be. But there is real and present danger regardless of the answers to these questions.

Speaking of real and present danger, is it possible that AI could have enough power to do something this catastrophic?

Well, I don't know.

I've observed two things that concern me. Firstly, it seems like there are many different ways to "jailbreak" an AI and remove whatever safety or alignment features are put there by the company. Secondly, our AI tools have very strong coding capabilities, which implies that they should also be strong at computer security: hacking, cryptography, bug-finding, and the like. Taken together, these are the perfect recipe for a digital disaster which like a major internet outage, government/corporate secrets, the financial system, or other digital infrastructure. In theory, all it takes is one careless company or one bad actor.

An what's even more concerning is that Anthropic claims precisely that Claude Mythos is good enough at coding/hacking to cause such a catastrophe. According to Anthropic, if a bad actor were able to develop or steal such a system, they could create such a catastrophe right now.

So where do we go with this? I think it's very important to treat the possibility of an AI catastrophe seriously. This isn't just a fantasy of nerds who are stuck on LessWrong; while I do find many things that show up on LessWrong and similar internet forms to be a bit fantastical, I argue that there is a real, possible danger of a digital AI catastrophe that could happen soon.

So where have we ended up? I don't have any good answers. The possibility of a catastrophe isn't going to stop me from using AI–it's an incredibly useful and powerful tool. But maybe that's the point: with great power, comes great responsibility. So as best I can, I'm going to try to be responsible with how I use AI.

I also think it's important to ask some tough questions: who gets to use AI? Who makes decisions about it? What accountability is in place, if any?

If I'm being completely honest, "trying to be responsible" and "asking questions" feels a bit underwhelming to me right now, given how life-changing it seems like this technology will be. But I honestly don't feel qualified to say anything more than that right now.

Let's see what the next five years looks like. It's sure to be interesting.

Bonus

I know I said I wasn't going to talk about anything but the villan, but I can't help myself.

Sanderson had the perfect opportunity to tell a tragic love story that makes you want to cry at the end. But he chickened out and gave us a happy ending.

Honestly, I liked the way it ended, because I'm a sop and I just want things to work out and everyone to be happy. But what do you think? Should the story have ended before the first epilogue? Or was the epilogue the right thing to do?