## Monday 25 January 2010

### Different Question, Same Idea, Sort of

A couple of days ago I wrote about the probability of a 28 Year Snow and 100 year Rain. It reminded me of another problem that, seemingly completely different, is somewhat related.

Here is a version of the question... Suppose you were looking at a group of n (a large number of numbers) numbers presented one at a time. Your goal is to guess which is the largest number in the list. You get to look at one number, and say "That one." or pass, if you pass, you can never go back and the next number is presented.. How do you decide which number to pick. You could just guess one of the numbers, and when it comes along pick it.. the probability you would win is 1/n. Or you could guess that there will be approximately ln(n)+.58 numbers.. this is the limit for the harmonic series, $\sum_{k=1}^{n}\frac{1}{k}$= [1+1/2+1/3+1/4+...+1/n (for big n...and that doesn't mean huge...at n=20, the actual value is 3.597 and the approximation is 3.572) approaches ln(n) + .0.57721, which will estimate the number of record breaking events in N events, for example, record breaking floods in a 100 year period (look for about 5... ln(100)+.577 = 5.182).. and when that many record results occur, you could pick it... (this would have been my first choice)... unfortunately, the standard deviation of the number of records that occur is pretty wide, so you are not likely to win more than about 20% of the time... Still, in most cases that is better than 1/n.

But what if you decided to wait until a certain number, call it r, of events had passed, then pick the next number higher than any of the first r... But what r would you pick.

If there are three numbers, 1, 2, and 3, they can occur in six orders. If you decide to always pick the first, or second or third you would win 1/3 of the time.. Or you could wait and after the first, pick the next one that is higher... Well if 3 is first, you lose, but if the order is 1-3-2; 2-3-1; or 2-1-3 you would win, hey, that's a fifty percent chance of winning..

If there are four numbers, they can be in 24 different orders. Would it be best to let one number go, or two. If we pick the first number, we know the winning probability is only 1/4, and if we wait until the last number, it is again 1/4. But what if we let one go, then pick the next that exceeds that value... We would pick the right number in 11/24 of the orders.. If we let two go, then picked the next that exceeds those two, we would only win in 5/12 of the trials.

So how do you decide? How many numbers would you let go before you picked the next higher number. If there were ten numbers in all, for example, and you let r of them go by, the probability that the next one is the biggest (and hence you win) is 1/n. But you would also win if the next one was smaller than the max of the first r numbers [prob r/(r+1)] and the one after that WAS the largest in the set (prob 1/n again)... Or ...you could win if the next two after you stopped counting were not greater than the max of the first r numbers [probability r/(r+2)] and the following one (yep, prob 1/n) was the biggest value. Ok, you see a pattern here. So the probability of winning after waiting for any r-values is P(r) = 1/n[1 + r/r+1 + r/r+2 + ...+ r/n-1] but if we factor out the r on top.. this is just r/n [1/r+1/r+1...+ 1/n-1] and that last part is the harmonic series between 1/r and 1/n-1... If we make the assumption that r and n are pretty big then we can approximate the sum of the harmonic series out to 1/(n-1) as Ln(n-1)+ .577 and out to 1/r it would be Ln(r)+.577. Subtracting this last from the first, we see that it will be about Ln(n-1) - Ln(r) or, using the log properties you learned in Alg II, we can write that as the Ln(n-1/r)... so for any number of numbers n, if we let the first r go by and pick the next number greater than any of those, our probability of having that be a winning number is (r/n) Ln(n-1/r). All we have to do is find the maximum value of Ln(n-1/r) and we know how many r to let pass. For calculus students, n is a constant here, r is the variable, so we want find the maximum of (r/n)*Ln(n/r).... and for algebra students, graph it and find the max probability.

It turns out that the max happens when r=n/e... so for n= 100, we would wait for the first 1/e = 36.7 (ok, 37) numbers and then pick the next number that is bigger than any of those...and the probability you win.... 1/e or about 36.8%

And how is that related to the 100 year snow? Well, 1-.368 = .632... So the probability of picking the max value from a string of 100 numbers by this strategy is the same as the probability that you do NOT hava 100-year snow in the next hundred years... . Bundle up... the snow is more likely.... rule of the day for multiple choice.... Pick e.... ;-}