One More Drop In the Ocean

Renegades of The Ordinary

Statistics – not as boring as you may think

Posted by vineetgupta on July 8, 2008

I like counter-intuitive stuff.

I like things which appear to be painfully simple at first glance but are actually totally different. Careful analysis shows that the cognitive leap was defective at some point. Following past cognitive errors help in ironing out future cognitive errors.

For example, consider the following.

In a classroom of 50 children, what is the probability that two kids share the same birthday?

The intuitive process goes thus: “A birthday is a pretty rare event (1 day in 365). The number 50 is also much less than 365. For two guys to have their birthday on the same day! That must be pretty improbable.”

The correct answer: More than 97%.

How did this happen?

The defective cognitive leap lay in this fact: The birthday problem asks whether any of the 23 people have a matching birthday with any of the others — not one in particular.

In a list of 50 people, if you compare the birthday of the first person on the list to the others, you have 49 chances of success, but if you compare each person’s to the others, you have a lot more chances.

The actual curve of probability against number of people goes thus (click to enlarge).

Here’s a much tougher question. Bear with me on this one. We’re going to play a game.

I’ve got two robots – I name them htt and hth. Yes, those are really their names.

I give one shiny new totally unbiased coin to each one.

Then I go to htt and say: Flip the coin and keep noting down the side which comes up each time. The moment you get the same pattern as your name (Head, tails, tails: HTT), stop and note down how many tosses it took you to achieve your goal.

htt, being an obedient robot, starts on this task. The combination he gets is:

HHTHTHTT

And reports the value of “number of tosses” as 8.

Then I go to hth and say the same thing to him, only his target pattern is HTH.

hth, being equally obedient, gets the combination:

HHTTHHTH

And he also reports the value of “number of tosses” as 8.

Satisfied that my robots know how the system works, I lock them in a time chamber and tell them to repeat this experiment one trillion times, take an average of all the values of “number of tosses” they get, and report it to me.

If I compare the two values (from hth and htt), what will occur?

A: Both values will be the same

B: hth will report a larger number

C: htt will report a larger number

Solve it if you can. No guesswork allowed. If you have a solution, post it with an explanation. If none of you can get it, I’ll provide an answer and explanation in the next post.

6 Responses to “Statistics – not as boring as you may think”

  1. Kim said

    Interesting. I think the answer is A but the topic of the post means it is probably a fourth option D! :-)

  2. Well, it’s not a trick question, the answer is A, B or C. I’ll grant you that.

  3. Hairy said

    Cool problem. It’s essentially one of ‘frameshift,’ I call it. The probabilitiesthat either will get their name in the first three tosses are equal at one eighth. In order to see what happens after the first three tosses, you hafta multiply the probability of the first two letters of the name in the second and third positions of the permutation set times the probability of the last letter of the name being the first letter of the permutation set. Then you hafta look at the first single letter in the name in the last permutation position multiplied by the last two letters of the name in the first two permutation places. Hope i said that right…

    So running through that, if i did my math right, htt will report a number on the order of ten times bigger than hth because there are so many more ways to accidentally spell hth than htt with the eight permutations available.

    quick thought… would this hold for a four glyph name? htth and thht? out of time

    please post the answer!

  4. Hi Hairy! Thanks for commenting!

    I think you’re on the right track, but stumbled in the middle somewhere and reached a wrong place. You’ve caught the key to the solution: the problem IS one of frameshift.

    Time for another hint. I’ll post the final solution tomorrow. The expressions HTT and HTH are NOT similar (despite appearances). Similar terms would be HTH and THT, not HTH and HTT. If the question was about HTH and THT, the answer would be A.

    As for your quick question, If the terms were HTTH and THHT (similar 4-letter terms), the answer would be A as well.

  5. Hairy said

    *hee hee* 4/64 is not 1/4, last time i checked. Worse still, i made a greivous conceptual error in my haste… you know, i was innocently surfing the internet and i wind up spending half my friday working on a problem that someone else knows the answer to already… that’s my way of saying ‘kudos to you’

    i’m sticking by my answer, C, htt will have the higher average. But not by ten times.

    This answer hinges on the ’semi-assumption’ that the robots do not ‘clear their memory’ after the coins do in fact concatenate their names. (If the robots did clear their memories and start a fresh list, it really looks like a statistical dead heat.)

    If, instead, the next value is tacked onto the two that preceded it, in essence allowing for a ‘frameshift,’ statistical deviation appears. (If this assumption is incorrect, i’m totally in over my head here! toss life preserver here)

    Instead of a ten-fold-longer average, i’m thinking that htt will have an average list length 2 times as long as hth’s list, as ‘tt’ must go through three subsequent coin flips to have a single chance in eight to be born again, whereas ‘th’, with it’s self-effacing symmetry needs only two subsequent coin flips to regenerate itself!

    Something is bothering me, though:

    Intuitively, there just *can’t* be a real difference between 1) an all-at-once consideration of an existing random set of h’s and t’s and 2) the flip-by-flip generation of a set of h’s and t’s which results in an existing random set. Can there?

    It seems like there must be *some* difference between the two, or the frameshift idea would be exactly the same as searching an existing random series for strings of certain length and character, and such a search should be statistically flat, shouldn’t it?

    blah blah blah it’s Miller Time

    thanks for the puzzle, i just love it

  6. [...] Statistics – not as boring as you may think [...]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>