Template for views on AI safety

Paths to AGI

Path Timeline for when we reach AGI Probability that we first reach AGI using this path Safety rating Implications (e.g. emergence of singleton, takeoff shape, self-modification)
De novo
Whole brain emulation
Intelligence enhancement

Approaches to alignment

Approach Time/resource cost to achieve alignment Probability of this approach working in principle (i.e. ignoring AI timelines) How competitive this approach would be with unaligned AGI Number of serial discoveries needed/how parallelizable the approach is
Highly reliable agent design
Task-directed AGI
Paul Christiano’s approach (are there multiple?)
Inverse reinforcement learning
Learning from human preferences
Adversarial examples
Working on philosophical questions
Indirect normativity
Coherent extrapolated volition

https://vkrakovna.wordpress.com/2017/08/16/portfolio-approach-to-ai-safety-research/ also suggests various “properties” to group the different alignment approaches.

The role of philosophy

Eliezer has said something to the effect that “copy-pasting a strawberry hits 95% of the interesting alignment problems”, but he has also said we can’t do with anything less than full human morality, or something similar. Wei Dai pointed this out in a Facebook thread. I think this is related to the “how much philosophy do we need to understand?” question but probably distinct.

Implicit in the philosophical pessimism that Wei Dai has seems to be the idea that if we don’t do philosophy right, the expected value of the far future will be catastrophically bad or small or whatever, rather than merely “okay” or “pretty good” or “very good, but still far from optimal”. Is the reasoning like one given here?

Role of philosophy in alignment
How much philosophy do we need to understand? Do we need to specify “all of human morality”?
How benign does the environment need to be to get philosophy right?
Weird failure modes (e.g. siren/marketing worlds, malign prior)

Miscellaneous questions

Hardware overhang; possibly interesting search
Openness vs secrecy
Race dynamics
Differential development/stuff about desirability of slow technological development
Ability reduce problems to learning problems; see here
Amount of hardware required for first AGI
State involvement
Ceiling for artificial intelligence (e.g. some people think AGI isn’t even possible in principle, and even those who believe AGI is possible have different views on how much smarter than a human a AGI could be)
Singleton/multipolar scenarios
Best-case scenario/mainline success scenario