Tuesday, 21 April 2009
Benford's Law--- One is NOT the Loneliest Number
If you looked in lots of reference books and found the areas of all the lakes on the Earth, about 30% of the numbers you would find would start with a 1. It doesn't even matter if some of the books gave area in square miles, others in hectares, and still others in square meters. This is one of the surprising results of Benfords Law. The same result would occur if you found the daily sales for all the McDonald's franchises in the world, and again, it doesn't matter that some are in dollars and others in yen. The law is named for US Physicist Frank Benford who published a description of the effect in 1938.
As you might have guessed, someone else did it earlier; a half century earlier. In 1881 a note to the American Journal of Mathematics by the great American astronomer, Simon Newcomb, described an unusual observation. He had noticed that the tables of logarithms that were in common use back then by astronomers, always had the pages of the lower numbers more dog-eared than the pages of the higher numbers. He suggested that natural observations tend to start with the number one more often than with an eight or nine. For some reason, the observation went without much comment. Years later Benford published data from an assortment of different areas, and the mathematical quirk of nature now bears his name. No reason was given for the unusual distribution until 1996, when Theodore Hill of the Georgia Institute of Technology published, what else, Hill's Theorem.
There is even a formula for how the distribution works. According to Benford's Law, If you take a large data sample from a collection of non-random but wide ranging data, the percentage starting with the digit d will be log(1 + 1/d) where the base ten log is intended. Since log(2) = .301 we estimate that 30% of the numbers you will see on the internet in a day start with 1. Log(1 + 1/2) or approximately 17.6% start with two and each larger digit becomes a little less common with a little less than 5% of all numbers starting with nine.
The applications of Benford's Law are just starting to emerge in the area of detecting fraud. Several cases have already been found. The New Scientist reported that over a million dollars in fraud was discovered using this process in a health care incident. The computer search showed an unusually large distribution of claims beginning with a six, and a large number of them turned out to be bogus. Bunko artists of the future will have to be better mathematicians, it seems
There does not seem to be a clear and definite rule about what kinds of data seem to obey Benford's Law, but in general, data coming from a wide range of statistical distributions seem to apply; so if you recorded all the numbers you encountered on the internet today, the first digits should probably follow Benford's Law.
You can find more in this article about the law from this article by Jonathan R. Bradley and David L. Farnsworth of the Rochester Institute of Technology.