Monday, 6 October 2008

Standard Deviation Computation in One Pass


One more new (to me) item in my explorations of the standard deviation. I came across a one-pass method of computing the sample standard deviation which seems to be exact, not an approximation.

Normally the way we compute the sample standard deviation requires two passes through the data, one where we calculate the mean, then again to find the deviations from the mean to calculate the RMS of the deviations. This method is good for most cases where the data comes to you in a set and complete. But in cases where you have data streaming in constantly, say as from the Hubble telescope or something, having to go back and recalculate the total each time may be time consuming. What is needed is a way to take the old results and produce the new result with the one piece of data added.

I found just such a result at a page From John D Cook He writes, "This better way of computing variance goes back to a 1962 paper by B. P. Welford and is presented in Donald Knuth's Art of Computer Programming Vol 2, page 232, 3rd edition. Although this solution has been known for decades, not enough people know about it. ...It is not obvious that the method is correct even in exact arithmetic. It's even less obvious that the method has superior numerical properties, but it does. The algorithm is as follows.

Initialize M1 = x1 and S1 = 0.

For subsequent x's, use the recurrence formulas

Mk = Mk-1+ (xk - Mk-1)/k

Sk = Sk-1 + (xk - Mk-1)*(xk - Mk).

For 2 ≤ k ≤ n, σ2 = Sk/(n - 1)."
although I think the sigma-squared should be s2 from the values I get and the division by n-1. For those who use the Ti 84 and such calculator, this is an easy program to write, or you may download a program I quickly wrote here. It asks for each x in turn, and reports the running mean and sk after each entry.

5 comments:

Pat's Blog said...

Received the following in direct e-mail from Roger Johnson:

The method in question is sometimes referred to as Welford’s method and appears in a 1962 article of the journal Technometrics
(vol. 4, no. 3, pp. 419-420, “Note on a method for calculating corrected sums of squares and products”).

The method also appears on the cover of the current issue of Teaching Statistics should you subscribe to such!

Roger Johnson

Editor, Teaching Statistics

Department of Mathematics & Computer Science

South Dakota School of Mines & Technology

Anonymous said...
This comment has been removed by a blog administrator.
Paul said...

IANAM, but I agree with your observation that that's sample variance; standard variance is σ2 = Sk/n

Author said...

Standard Deviation

Standard Deviation Calculator

Roman Adam said...

Wow!!
Thanks for sharing this post and I agree with your observation!!
ladies handbags