Views on AI safety

Views
Consensus
Discussions

Views

The people who seem most knowledgeable about AI safety still seem to disagree about many things in AI safety (and also many things outside of AI safety, but that is less relevant here). It seems worth collecting some of these views into a single place.

Carl Shulman’s views on AI safety
Dario Amodei’s views on AI safety
Eliezer Yudkowsky’s views on AI safety
Paul Christiano’s views on AI safety
Robin Hanson’s views on AI safety
Wei Dai’s views on AI safety
Daniel Dewey’s views on AI safety
Luke Muehlhauser’s views on AI safety
Holden Karnofsky’s views on AI safety
Vladimir Nesov’s views on AI safety
Katja Grace’s views on AI safety
- https://meteuphoric.wordpress.com/2009/10/16/how-far-can-ai-jump/
Owen Cotton-Barratt’s views on AI safety
Andrew Critch’s views on AI safety
Jacob Steinhardt’s views on AI safety
Stuart Armstrong’s views on AI safety
Nick Bostrom’s views on AI safety
Miles Brundage’s views on AI safety
Brian Tomasik’s views on AI safety

Here’s the master list of topics I want to cover (each of the pages above might have a smaller list because I made them before I expanded this master list):

Topic	Details
AI timelines	Also implications of short or long timelines (which should probably covered in the “value of” rows instead).
Kind of AGI we will have first (de novo, neuromorphic, WBE, etc.)
Preference ordering between kinds of AGI	e.g. some people prefer WBE because human values are more likely to be preserved, while neuromorphic AI seems difficult to understand so more difficult to control, and so forth
Type of AI safety work most endorsed
Value of highly reliable agent design (e.g. decision theory, logical uncertainty) work
Value of machine learning safety work
Value of intelligence amplification work
Value of pushing for whole brain emulation
Value of thinking of esoteric failure modes	see e.g. this remark
Difficulty of AI alignment
Shape of takeoff/discontinuities in progress
How “prosaic” AI will be
Difficulty of philosophy
How well we need to understand philosophy before building AGI
Cooperation vs values spreading/moral advocacy
How much alignment work is possible early on
Hardware/computing overhang	The extent to which AGI will be able to exploit existing hardware (e.g. how many copies of itself it can run and at what speed). A hardware overhang increases the chances of an intelligence explosion. I think the intuition for using the word “overhang” is that if we have hardware overhang, then at the moment we have “too much” hardware lying around that we are unable to use efficiently, but once AGI comes around, it will make use of the “overhung” hardware much more efficiently.
Relationship between ability of AI alignment team and the probability of good outcomes

Consensus

It might also be good to make a list of topics on which most people basically agree (orthogonality thesis? convergent instrumental goals? human-level AGI possible? starting on AI alignment early is a good idea? AI boxing methods don’t work?).

See the section “Background AI safety intuitions” in “My current take on the Paul-MIRI disagreement on alignability of messy AI”.

Also see “Five theses, two lemmas, and a couple of strategic implications” and “Four Background Claims”.

Up to around maybe 2012, there were a lot of discussions about alternative approaches like whole brain emulation, intelligence enhancement, and so on. But it seems like nowadays everyone is pushing for de novo AI alignment. Why is this the case? Did everyone’s AI timelines suddenly get really short or something?

here is an exception: http://lesswrong.com/r/discussion/lw/nun/superintelligence_via_whole_brain_emulation/

Carl Shulman might be working on human enhancement stuff; see here.

Actually I now think several FHI people might be working on this stuff more, but it’s not so publicly visible. For example, there is stuff like “Anders Sandberg has, along with visiting researcher Devi Borg, partially finished a project looking at the potential use of neural interfaces as a tool to facilitate AI safety.” from 2016.

Discussions

In addition, there are some discussion threads (on LessWrong and some other places) where some of these people go into long debates. Collecting some of these also seems like a good idea:

List of discussions between Paul Christiano and Wei Dai
List of discussions between Eliezer Yudkowsky and Wei Dai
List of discussions between Eliezer Yudkowsky and Paul Christiano
List of discussions between Vladimir Slepnev and Wei Dai
List of discussions between Eliezer Yudkowsky and Robin Hanson (the AI FOOM debate might be the bulk of this?)