This is my take on a classic probability paradox, the case of Sleeping Beauty. Along the way I’ll explain what I think is wrong with a classic rule of inference, Bayes’ Rule, and what needs to be done to fix it, but I hope you enjoy the paradox even if you don’t have a background in probability theory.
Here’s the scenario: Beauty volunteers for a lab experiment in the experimental philosophy department at her local community college. She goes into the lab on Sunday and the following procedure is explained to her:
- Today we will give you a sleeping potion and flip a fair coin: Heads or Tails each come up half the time.
- On Monday, we will awaken you and ask what probability you assign to the coin coming up Heads; then we’ll tell you the actual result of the coin flip. After that, we will give you another sleeping potion, which will also have the effect of erasing your memory of being awoken, asked, and told.
- If the coin did indeed come up Heads, we will repeat step 2 on Tuesday and again awaken you, ask for the probability that the coin came up Heads, and put you back to sleep. (Of course, your answer would be “100%” if you could remember that you were previously awoken, but since you’ll have had your memory erased you will have no reason to give a different answer than on your first awakening.) If the coin came up Tails, we’ll let you sleep through Tuesday.
- Finally, on Wednesday we will give you the antidote to the sleeping potion and you’ll wake up, having regained any missing memories.
The question is: Upon being awoken partway through the experiment, what answer should Beauty give for the probability of the coin flip being Heads?
The paradox is that there are two standard answers, and both seem to have watertight arguments for them:
- The classical position is that upon awakening, Beauty should assign a probability of 1/2 to the coin flip being Heads. Before downing that first sleeping potion, she knows the coin has equal chances of coming up Heads or Tails, and upon awakening, she has learned no new information that would change her estimate.
- But there’s a competing answer: that when she wakes Beauty should assign a probability of 2/3 to the outcome of Heads. After all, if the experiment is repeated 100 times, then she’ll be awoken about 150 times (50 from the 50 Tails, and 100 from the 50 Heads), and two-thirds of those awakenings will occur under the circumstances that the coin flip was Heads.
Most of the debate between the “halfers” and the “thirders” is along the lines of “Here’s why my answer is right!” “No, you must be wrong, because here’s another reason why my answer’s right!” Well, I’m going to try to explain why I think the halfer argument is wrong, and how when you fix it it agrees with the thirder answer.
First let’s review ordinary probabilistic inference: how to update your probability estimates when you receive new information. For example, if I wake up and have no idea what day of the week it is, I should assign a probability of 5/7 to the proposition that it is a weekday, and 2/7 to it being a weekend day. But if turn over and find I have woken up to our early alarm, which only rings Sunday through Friday, then I can eliminate Saturday as a possibility, and the probability that it is a weekday goes up to 5/6.
In general, a theorem called Bayes’ Rule lets you update probability given new evidence without having to break the situation down into every possibility: for each hypothesis you have (is it a weekday? the weekend?) you multiply its initial probability by the chances of you observing that evidence under that hypothesis, and then rescale all the results so the total sums to one. In our case, my early alarm ringing has a 100% probability of happening if it’s a weekday, but only a 50% probability of happening if it’s the weekend. So I update the probabilities as follows:
The classical “halfer” argument goes that since Beauty’s observations upon waking are the same regardless of the coin flip—in fact, she knows for certain what she will experience even before the coin is tossed—the probability she assigns to Heads must not have changed from its original 1/2.
I claim that when updating according to new evidence, it actually makes sense to multiply not just by the likelihoods of observing that evidence given your various hypotheses, but also by the number of people making that observation. Usually there’s the same number of observers under each hypothesis, so this isn’t a factor, but in our case, two of Beauty’s selves are around to experience each Heads coin flip but only one for each Tails flip:
This is the type of calculation that reduces correctly to the case of individual, equally likely possibilities: if Beauty views each waking as randomly chosen from all the days of her life (say there are N of them), then observing that she is awakening to the coin flip question happens twice as often under the Heads scenario as under Tails, so her probability of Heads should be twice that of Tails.
This picture suggests the following, conventional application of Bayes’ rule to get the same result, where again the “observation” is choosing a random day of Beauty’s life and finding that she is having the coin flip question posed to her:
So there’s my resolution to the Sleeping Beauty paradox: in order to apply Bayes’ Rule consistently, it has to include a factor not just for the likelihood of observing the evidence, but also for the number of people making that observation. We saw in the class size paradox that the average class size experienced by professors is not the same as the one experienced by students, because students are more likely than professors to be in large classes, because larger classes have more students in them. In future blog posts I’ll share some more examples where this sort of probabilistic reasoning comes up in real life (in particular, in hearsay and advertising).
As far as I know, this is not a mainstream variant of Bayes’ Rule, and I would love to know whether there’s an axiomatic system that can deduce it in the same way that Bayes’ Rule can be proven from the usual axioms of probability theory. I’d especially like to know how to make sure, in cases where there’s also a way to use the conventional Bayes’ Rule to get the same answer, that I don’t accidentally double-count the observer factor.
Edit: I’ve found the website where I originally came across the Sleeping Beauty paradox: Some “Sleeping Beauty” postings. It looks like I swapped the roles of “Heads” and “Tails” compared to the usual version, but otherwise it’s the same. It was in fact the Andy and Bob variant that got me to think about likelihood needing an extra factor for the number of observers. Does anyone have any thoughts about that or any other paradox from the list?