Monday, 6 October 2008

Standard Deviation Computation in One Pass


One more new (to me) item in my explorations of the standard deviation. I came across a one-pass method of computing the sample standard deviation which seems to be exact, not an approximation.

Normally the way we compute the sample standard deviation requires two passes through the data, one where we calculate the mean, then again to find the deviations from the mean to calculate the RMS of the deviations. This method is good for most cases where the data comes to you in a set and complete. But in cases where you have data streaming in constantly, say as from the Hubble telescope or something, having to go back and recalculate the total each time may be time consuming. What is needed is a way to take the old results and produce the new result with the one piece of data added.

I found just such a result at a page From John D Cook He writes, "This better way of computing variance goes back to a 1962 paper by B. P. Welford and is presented in Donald Knuth's Art of Computer Programming Vol 2, page 232, 3rd edition. Although this solution has been known for decades, not enough people know about it. ...It is not obvious that the method is correct even in exact arithmetic. It's even less obvious that the method has superior numerical properties, but it does. The algorithm is as follows.

Initialize M1 = x1 and S1 = 0.

For subsequent x's, use the recurrence formulas

Mk = Mk-1+ (xk - Mk-1)/k

Sk = Sk-1 + (xk - Mk-1)*(xk - Mk).

For 2 ≤ k ≤ n, σ2 = Sk/(n - 1)."
although I think the sigma-squared should be s2 from the values I get and the division by n-1. For those who use the Ti 84 and such calculator, this is an easy program to write, or you may download a program I quickly wrote here. It asks for each x in turn, and reports the running mean and sk after each entry.

Post a Comment