Progress report 2018-03-08 to 2018-03-29
This page describes progress on this wiki from 2018-03-08 to 2018-03-29. All the work in this period was by Issa.
Plan
There was no plan written beforehand for this period.
What was accomplished
Most of the work in this period was done in AI safety. The basic motivation for this work is something like “AI safety seems like an important cause area, and there are a lot of smart people talking about it, but they all seem to disagree about a number of things. What’s up with that?”
Some other observations:
- There has been quite a bit of thinking done in AI safety strategy.
- Most of this thinking has been done by a small group of people we can identify.
- Much of this thinking is stored in the heads of this small group of people or in scattered threads on LessWrong and other discussion forums. The most basic insights have been written down in Superintelligence and a few other summary documents, but there is still a lot that is not summarized.
- One insightful way to organize information is to take a standardized list of subtopics and record what a bunch of experts think about those subtopics.
The current state of understanding people’s views on AI safety seems to involve hunting down a bunch of discussion threads all across the internet.
I want to try to model people’s thinking. I think Ideological Turing Tests and just being able to model others accurately (e.g. predict what they would say on some topic they haven’t explicitly talked about) is important.
Having a convenient/canonical lookup of these people’s views seems useful. I don’t think I’m the only one who often wonders “what does X think about Y?”
I want to understand the key considerations for people’s beliefs. This will help to prioritize where to look in terms of things to research.
Different people have different “starting points” or “talking points” for discussion. For example, when Wei Dai talks about the difficulty of alignment, he talks about the difficulty of philosophy, while when Eliezer Yudkowsky talks, he talks about task-directed AGI, context disaster, optimization daemons, etc. On the one hand this gives one account of why these people disagree, which is that they start from different places and use different thought patterns to reach different conclusions. On the other hand this is pretty dissatisfying because I can follow along with each person’s reasoning thinking that it mostly makes sense, and yet they reach different conclusions!
So what I want to be able to do is grok each person’s model well enough where I can make them say new things that they haven’t actually said.
This sort of belief summarization seems more important in AI safety than in other fields, because there has been less summarization going on so far.
Some difficulties I’ve encountered:
- Many of the people whose beliefs I want to track have not weighed in on many of the topics (or if they have, I have had difficulty locating where they’ve said it). This means there will be a lot of “holes” in the tables that can’t be filled (unless I ask them directly).
- Since the field is new, a lot of terminology keeps changing. For example, just recently Paul Christiano and Wei Dai had a discussion about what they even mean by “AI alignment”.