Views on AI safety


The people who seem most knowledgeable about AI safety still seem to disagree about many things in AI safety (and also many things outside of AI safety, but that is less relevant here). It seems worth collecting some of these views into a single place.

Here’s the master list of topics I want to cover (each of the pages above might have a smaller list because I made them before I expanded this master list):

Topic Details
AI timelines Also implications of short or long timelines (which should probably covered in the “value of” rows instead).
Kind of AGI we will have first (de novo, neuromorphic, WBE, etc.)
Preference ordering between kinds of AGI e.g. some people prefer WBE because human values are more likely to be preserved, while neuromorphic AI seems difficult to understand so more difficult to control, and so forth
Type of AI safety work most endorsed
Value of highly reliable agent design (e.g. decision theory, logical uncertainty) work
Value of machine learning safety work
Value of intelligence amplification work
Value of pushing for whole brain emulation
Value of thinking of esoteric failure modes see e.g. this remark
Difficulty of AI alignment
Shape of takeoff/discontinuities in progress
How “prosaic” AI will be
Difficulty of philosophy
How well we need to understand philosophy before building AGI
Cooperation vs values spreading/moral advocacy
How much alignment work is possible early on
Hardware/computing overhang The extent to which AGI will be able to exploit existing hardware (e.g. how many copies of itself it can run and at what speed). A hardware overhang increases the chances of an intelligence explosion. I think the intuition for using the word “overhang” is that if we have hardware overhang, then at the moment we have “too much” hardware lying around that we are unable to use efficiently, but once AGI comes around, it will make use of the “overhung” hardware much more efficiently.
Relationship between ability of AI alignment team and the probability of good outcomes


It might also be good to make a list of topics on which most people basically agree (orthogonality thesis? convergent instrumental goals? human-level AGI possible? starting on AI alignment early is a good idea? AI boxing methods don’t work?).

See the section “Background AI safety intuitions” in “My current take on the Paul-MIRI disagreement on alignability of messy AI”.

Also see “Five theses, two lemmas, and a couple of strategic implications” and “Four Background Claims”.

Up to around maybe 2012, there were a lot of discussions about alternative approaches like whole brain emulation, intelligence enhancement, and so on. But it seems like nowadays everyone is pushing for de novo AI alignment. Why is this the case? Did everyone’s AI timelines suddenly get really short or something?

here is an exception:

Carl Shulman might be working on human enhancement stuff; see here.

Actually I now think several FHI people might be working on this stuff more, but it’s not so publicly visible. For example, there is stuff like “Anders Sandberg has, along with visiting researcher Devi Borg, partially finished a project looking at the potential use of neural interfaces as a tool to facilitate AI safety.” from 2016.


In addition, there are some discussion threads (on LessWrong and some other places) where some of these people go into long debates. Collecting some of these also seems like a good idea: