Dario Amodei’s views on AI safety

Topic View
AI timelines Dario says stuff like “It won’t be long now”, so presumably he has short AI timelines in mind. It’s not clear what his full thinking on timelines is.
Value of decision theory work In panel musings video he says he suspects there is no “best decision theory”. It’s unclear what his full thinking on decision theory is (especially of the kind MIRI works on).
Value of highly reliable agent design work
Difficulty of AI alignment He gives some reasons in panel musings video on why alignment might not be all that hard, but admits he doesn’t really know.
Shape of takeoff/discontinuities in progress
Type of AI safety work most endorsed Machine learning-related safety work, where capabilities work and safety work happen together.
How “prosaic” AI will be
Kind of AGI we will have first (de novo, neuromorphic, WBE, etc.)

https://www.jefftk.com/p/conversation-with-dario-amodei

some comments on https://www.facebook.com/jefftk/posts/887465634632

more comments on https://www.facebook.com/jefftk/posts/886375748772

https://futureoflife.org/2016/08/31/transcript-concrete-problems-ai-safety-dario-amodei-seth-baum/

https://80000hours.org/podcast/episodes/the-world-needs-ai-researchers-heres-how-to-become-one/

from here:

Full Disclosure: I’m friends with Dario and know things through him I can’t share here. I’ve also outsourced my opinion on AI risk to him since before he was working at OpenAI.

Quotes from “Panel Musings on AI”

From this video:

Starting around 19:20. On whether he is more concerned about developing advanced AI or not developing advanced AI.

I think I’m deeply concerned about both. So you know on the not developing advanced AI, one observation you can make is that modern society and particularly society with nuclear weapons has only been around for about seventy years. There have been a lot of close calls since then and things seem to be getting worse. If I look at kind of the world and geopolitics in the last in the last few years China’s rising, there’s a lot of unrest in the Western world, a lot of very destructive nationalism, we’re developing biological technologies very quickly. It’s not entirely clear to me that civilization is compatible with digital communication. It has really some subtle corrosive effects. So every year that passes is a danger that we face and although AI has a number of dangers actually I think if we never built AI, if we don’t build AI for 100 years or 200 years, I’m very worried about whether civilization will actually survive. Of course on the other hand I mean I work on AI safety, and so I’m very concerned that transformative AI is very powerful and that bad things could happen either because of safety or alignment problems or because there’s a concentration of power in the hands of the wrong people, the wrong governments, who control AI. So I think it’s terrifying in all directions but not building AI isn’t an option because I don’t think civilization is safe.

Starting around 29:10:

So from the EA community in particular – I mean I think there’s a lot of things the EA community gets right about AI that no one else gets right. But one thing that I’d like to see less of is that there’s a particular model I often see from – not all but some – people in EA which is sort of that there’s like two progress bars and one of them is the AI capabilities progress bar and the other is the AI safety progress bar, and if the AI capabilities progress bar reaches the end before the AI safety progress bar then we all die, and if the AI safety progress bar reaches the end first then it’s great. I think this model is kind of dangerous because I think it’s inaccurate and it really drives the impression among AI researchers that AI safety people think their work is evil or trying to hold back their work, and I think that actually does push against acceptance.

The reason I think it’s not right is that at least in my own experience a lot of the safety work I do is made possible by recent advances in capabilities work. A lot of the safety work I do also allows you to do capabilities work in different ways that’s maybe safety-compatible. I also think that relating to AGI in particular some of the safety work that we end up doing with respect to AGI, a lot of it may only be possible within the last two or three years before AGI. So there’s a lot of intertwinement between these two things and while it’s true that I think that we should work on safety research right away – we shouldn’t wait until when we’re gonna build AGI tomorrow – and that’s why I’m working on it now, I think we should maybe be less extreme and chill a little bit about being scared about the next breakthrough in capabilities. It’s possible that we will have a hard time doing all the safety research at the end but there’s actually a limited amount that we may be able to do early on. So the situation is just not as binary, and I think that frame polarizes AI researchers against AI safety researchers and I think that’s unhelpful.

Starting around 35:30:

So here’s a one-liner: AI doesn’t have to learn all of human morality. So this is something actually that even at this point MIRI agrees with but there’s some writing in the past – a lot of writing in the past – that I think people are still anchored to in many ways. So what I mean by that in particular is that I think the things we would want from an AGI are to kind of stabilize the world, to end material scarcity, to give us control over our own biology, maybe to resolve international conflicts. We don’t want or I think need to kind of build a system – at least not build a system ourselves – that kind of is a sovereign and kind of runs the world for the indefinite future and controls the entire light cone. So of course you still need to know a lot of things about human values but I still often run into people who are kind of thinking about this problem in a way that I think is actually harder than we probably need to solve.

Starting around 39:30:

So first of all I don’t know. The problem could be really easy, it could be really hard. That’s a reason to work on it early. I am relatively optimistic, though no one should take this as a reason to stop caring about AI safety. Even if there were only a 5% risk that something catastrophic would happen to the world because of it, that would be well worth much more effort than the world is putting into it. The effort of all the people in this room and many others it’s still just ridiculously under-allocated.

That said, two reasons I might have for believing that it might not be all that hard. The first is the one I just said before that I think we may not need to learn all of human morality and build a sovereign and basically build something that decides what the best way to set up human society is. That problem sounds really hard. We may just need to build artefacts that perform particular engineering tasks for us in order to put the world in a better place. Now they’re very hard engineering tasks that we don’t currently know how to do until – the problem is still hard because you may still have to do large searches over difficult strategies. But it’s not nearly as hard as if you look at the early writings of MIRI. MIRI ten years ago, people suspected that it would be.

So that’s one thing and the second thing is I would point out a lot about the meta-learning thing I said before where a lot of the time we’re finding that it’s often more efficient rather than developing an algorithm to understand how something works in a fundamental philosophical sense, often that is impossible or maybe there is no fundamentally best algorithm for doing something. So I often suspect this about decision theories, where there’s a bunch of different decision theories that each has different pluses and minuses and people argue about the merits of them and think about ways that catastrophic things could happen depending on which decision theory you use, and I think a possible way around this is just have an AI learn these concepts and maybe the concepts are inherently fuzzy, just like it took neural nets to tell the difference between objects when the differences between those objects are inherently fuzzy. We may be able to learn safety concepts that way and a lot of the concerns about safety are where there are all these edge cases: how do we make a thing that makes paper clips? And strangely it may actually be the case that not trying to solve the problem directly and taking one step of abstraction up may allow us to solve it. At the same time, there are still worries that we may not have enough of that kind of data. We may not have the right models to do it. So even if that’s right, then there’s still many ways in which we could have systems that are very capable but that manage to be dangerous in ways that we can’t easily detect. So I don’t really know for sure but those are some reasons for optimism at least.

Starting around 44:20:

So I work with Paul Christiano and he’s probably about as pessimistic or optimistic as I am and probably for similar reasons. I don’t have exact answers here but we’ve talked a lot and we have similar views on what will be hard and what won’t be hard and his agenda kind of states things in a different way but I think one of his themes is similar to the thing I said which is he wrote a post about corrigibility in particular where’s it’s like there’s all these paradoxes with if you try algorithms for corrigibility and maybe it isn’t that hard to learn the concept “corrigibility” and learn the concept of “honesty” and these are fuzzy concepts and so if you kind of try and define them to a rule-based system or very simple machine learning system they don’t make a lot of sense but neither do other advanced non-safety-related concepts. That said, to give an object-level argument about why things might be hard and kind of not contradict but supplement what I said before, people talk a lot about value alignment and learning right objective functions. I think we’re just gonna have a lot of very mundane problems with, you’re developing a self-driving car and the training data differs from the test data, you put it in the real world, it doesn’t do the right thing and if it screws up it kills someone. With very powerful AI, if you’re trying to make breakthroughs in biology, if you’re trying to resolve international conflict, messing up could kill a lot of people. I can even think of scenarios where it could kill everyone. And there’s probably a lot of those scenarios will be dealing with very new technologies. To me that’s one way that catastrophe could happen.

See also