Order effects in value judgment

Order effects in value judgment refers to the possibility that the order in which we encounter moral arguments influences our conclusions about values. If reasoning about our values turns out to be sensitive to the order in which we encounter arguments, then the causes/interventions we tend to support will also be affected by this.

I think some people also refer to this as “path dependence” of values or moral judgments or something to that effect.

There seems to be at least two separate concerns (aside: why are these “concerns”? why do we have feelings about feelings?):

  • Our “values upon reflection” may be sensitive to initial conditions/butterfly effects
  • Our “values upon reflection” may be sensitive to the order in which we encounter arguments (in the extreme case of pre-theoretic intuitions this reduces to the previous point)
  • There might not even be such a thing as “values upon reflection”, if our values don’t converge
  • Our “values upon reflection” might actually exist under “ordinary” circumstances, but certain adversarial forces (e.g. unaligned superintelligences) can push us toward any moral conclusion so we have to be careful

https://foundational-research.org/files/Multiverse-wide-Cooperation-via-Correlated-Decision-Making.pdf has a section that talks about order effects and links to http://www.philosophyexperiments.com/sedan/Default5.aspx.

see also “Order Effects in Moral Judgment”.

See the exchange between Brian Tomasik and Paul Christiano here.

wei dai also talks about this in his LW comments (or maybe a post?)

One of the “meta-ethical alternatives” Wei Dai gives is:1

None of the above facts exist, and reflecting on what one wants turns out to be a divergent process (e.g., it’s highly sensitive to initial conditions, like whether or not you drank a cup of coffee before you started, or to the order in which you happen to encounter philosophical arguments). There are still facts about rationality, so at least agents that are already rational can call their utility functions (or the equivalent of utility functions in whatever decision theory ends up being the right one) their real values.


I often wonder and ask others what non-trivial properties we can state about moral reasoning (i.e., besides that theoretically it must be some sort of an algorithm). One thing that I don’t think we know yet is that for any given human, their moral judgments/intuitions are guaranteed to converge to some stable and coherent set as time goes to infinity. It may well be the case that there are multiple eventual equilibria that depend on the order in which one considers arguments, or none if for example their conclusions keep wandering chaotically among several basins of attraction as they review previously considered arguments. So I think the singular term “reflective equilibrium” is currently unjustified when talking about someone’s eventual conclusions, and we should instead use “the possibly null set of eventual reflective equilibria”. (Unless someone can come up with a pithier term that has similar connotations and denotations.)


I’m envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won’t be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

And in passing in a couple of other comments.45

See also this comment.

And Vladimir Slepnev:6

This should’ve been obvious from the start, but your comment has forced me to realize it only now: if we understand reflective equilibrium as the end result of unrestricted iterated self-modification, then it’s very sensitive to starting conditions. You and I could end up having very different value systems because I’d begin my self-modification by strengthening my safeguards against simplification of values, while you’d begin by weakening yours. And a stupid person doing unrestricted iterated self-modification will just end up someplace stupid. So this interpretation of “reflective equilibrium” is almost useless, right?

more discussion here.

See also a comment by Marcello (and the follow-up comments in the same thread).7

also see https://foundational-research.org/dealing-with-moral-multiplicity/#What_about_reflective_equilibrium

See also a discussion between Wei Dai and Paul Christiano, starting around here.

See also