As we were talking I bemoaned the fact that few introductory textbooks seem to really help kids to develop any intuitive idea of what the standard deviation is or how it works. As we talked, I mentioned that I thought there was a geometric approach to the standard deviation that might help make it more clear. You be the judge.

I think the standard deviation is most easily approached as a distance (more specifically a sort of average of distances). Most high school stats students can quickly find the distance between two points on the plane using the square root of the sum of the squares of the differences (deviations) in each direction (dimension). For those who have never been introduced to it, only a few moments convinces them that it can generalize to n-dimensions. And in a few short minutes they can be finding the "distance" between (point)vectors in any number of dimensions, and many can quickly invent a shortcut to the calculation using the list functions of their calculators.

So why does the standard deviation as a distance make sense? The standard deviation is a measure of how much the data items "disagree" with each other. Start with two measures, and for the moment we use the unconventional notation of calling one of them x

_{1}and the other y

_{1}. Now if they agree perfectly, then they lie on the line y=x. If they don't, then they will be off the line by some distance. We begin by finding that distance. The perpendicular from the line y=x to the point (x

_{1},y

_{1}) would cross y=x at the point where the x and y values were the average of x

_{1}and y

_{1}, or at a point we call (xbar,xbar). That means the distance of the point (x

_{1},y

_{1}) from the line y=x is just

Now if all our data sets had only two values (and statistics was REALLY EASY) then we could use this "distance" measure as a "standard measure". But one of the funny things about distance is that it grows with dimension, "sort of"... here is what I mean. In one dimension, the distance from (0) to (1) is one unit. In two dimensions the distance from (0,0) to (1,1) is farther, it's the square root of two. In three dimensions the distance from (0,0,0) to a point one away in each dimension is the square root of three. This would meant that the data set {1,3} would seem to be "less spread out" than {1,1,3,3}, which seems like a bad thing. To compensate, we simply divide this Pythagorean distance result by the square root of the dimension.

In effect then, the standard deviation of a population of values is the distance between the n dimensional points A={x

_{1},x

_{2},x

_{3}..x

_{n}) and B= (x-bar,x-bar,.... x-bar) divided by the square root of n. In truth, it would seem there was no need to memorize a formula when the student understands it as a "mean distance".

As a happy coincidence, John Cook at The Endeavour web site just posted a blog about the relationship between vector geometry and statistics when finding the standard deviation of a sum or difference of two distributions. A must read for intro stats teachers who want to be able to explain what happens (and why?) when the distributions are NOT independent.