Views on AI safety
Views
The people who seem most knowledgeable about AI safety still seem to disagree about many things in AI safety (and also many things outside of AI safety, but that is less relevant here). It seems worth collecting some of these views into a single place.
- Carl Shulman’s views on AI safety
- Dario Amodei’s views on AI safety
- Eliezer Yudkowsky’s views on AI safety
- Paul Christiano’s views on AI safety
- Robin Hanson’s views on AI safety
- Wei Dai’s views on AI safety
- Daniel Dewey’s views on AI safety
- Luke Muehlhauser’s views on AI safety
- Holden Karnofsky’s views on AI safety
- Vladimir Nesov’s views on AI safety
- Katja Grace’s views on AI safety
- Owen Cotton-Barratt’s views on AI safety
- Andrew Critch’s views on AI safety
- Jacob Steinhardt’s views on AI safety
- Stuart Armstrong’s views on AI safety
- Nick Bostrom’s views on AI safety
- Miles Brundage’s views on AI safety
- Brian Tomasik’s views on AI safety
Here’s the master list of topics I want to cover (each of the pages above might have a smaller list because I made them before I expanded this master list):
Topic | Details |
---|---|
AI timelines | Also implications of short or long timelines (which should probably covered in the “value of” rows instead). |
Kind of AGI we will have first (de novo, neuromorphic, WBE, etc.) | |
Preference ordering between kinds of AGI | e.g. some people prefer WBE because human values are more likely to be preserved, while neuromorphic AI seems difficult to understand so more difficult to control, and so forth |
Type of AI safety work most endorsed | |
Value of highly reliable agent design (e.g. decision theory, logical uncertainty) work | |
Value of machine learning safety work | |
Value of intelligence amplification work | |
Value of pushing for whole brain emulation | |
Value of thinking of esoteric failure modes | see e.g. this remark |
Difficulty of AI alignment | |
Shape of takeoff/discontinuities in progress | |
How “prosaic” AI will be | |
Difficulty of philosophy | |
How well we need to understand philosophy before building AGI | |
Cooperation vs values spreading/moral advocacy | |
How much alignment work is possible early on | |
Hardware/computing overhang | The extent to which AGI will be able to exploit existing hardware (e.g. how many copies of itself it can run and at what speed). A hardware overhang increases the chances of an intelligence explosion. I think the intuition for using the word “overhang” is that if we have hardware overhang, then at the moment we have “too much” hardware lying around that we are unable to use efficiently, but once AGI comes around, it will make use of the “overhung” hardware much more efficiently. |
Relationship between ability of AI alignment team and the probability of good outcomes |
Consensus
It might also be good to make a list of topics on which most people basically agree (orthogonality thesis? convergent instrumental goals? human-level AGI possible? starting on AI alignment early is a good idea? AI boxing methods don’t work?).
See the section “Background AI safety intuitions” in “My current take on the Paul-MIRI disagreement on alignability of messy AI”.
Also see “Five theses, two lemmas, and a couple of strategic implications” and “Four Background Claims”.
Up to around maybe 2012, there were a lot of discussions about alternative approaches like whole brain emulation, intelligence enhancement, and so on. But it seems like nowadays everyone is pushing for de novo AI alignment. Why is this the case? Did everyone’s AI timelines suddenly get really short or something?
here is an exception: http://lesswrong.com/r/discussion/lw/nun/superintelligence_via_whole_brain_emulation/
Carl Shulman might be working on human enhancement stuff; see here.
Actually I now think several FHI people might be working on this stuff more, but it’s not so publicly visible. For example, there is stuff like “Anders Sandberg has, along with visiting researcher Devi Borg, partially finished a project looking at the potential use of neural interfaces as a tool to facilitate AI safety.” from 2016.
Discussions
In addition, there are some discussion threads (on LessWrong and some other places) where some of these people go into long debates. Collecting some of these also seems like a good idea:
- List of discussions between Paul Christiano and Wei Dai
- List of discussions between Eliezer Yudkowsky and Wei Dai
- List of discussions between Eliezer Yudkowsky and Paul Christiano
- List of discussions between Vladimir Slepnev and Wei Dai
- List of discussions between Eliezer Yudkowsky and Robin Hanson (the AI FOOM debate might be the bulk of this?)