Thursday 26 August 2010

Can Statistics Really be 120 Years Ahead of Science


Scientists have beeen able to use analysis of the 54 Genes that are known to be associated with height to predict the height of a subject. Unfortunatly the new and scientific approach is far less accurate than the statistical model created by Francis Galton 120 years earlier.

The study appears in the European Journal of Human Genetics (2009) 17, 1070–1075;Predicting human height by Victorian and genomic methods Here is the abstract,

"In the Victorian era, Sir Francis Galton showed that ‘when dealing with the transmission of stature from parents to children, the average height of the two parents, … is all we need care to know about them’ (1886). One hundred and twenty-two years after Galton's work was published, 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4–6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people, as characterized by the area under the receiver-operating characteristic curve (AUC). In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. We have also explored how much variance a genomic profile should explain to reach certain AUC values. For highly heritable traits such as height, we conclude that in applications in which parental phenotypic information is available (eg, medicine), the Victorian Galton's method will long stay unsurpassed, in terms of both discriminative accuracy and costs. For less heritable traits, and in situations in which parental information is not available (eg, forensics), genomic methods may provide an alternative, given that the variants determining an essential proportion of the trait's variation can be identified. "


From the details of the work, "In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4–6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people, as characterized by the area under the receiver-operating characteristic curve (AUC). In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance".

Galton's approach was not "just" the average of the parents heights, but involved the deviation of the midparent (the average of the parents when the female was scaled up by 1.08) from the average midparent. The offspring, Galton determined, would be only 2/3 as far away from the mean as their mid-parent, on average. It is this "regression toward mediocrity" that begat our present term for regression. In his words, "We can define the law of regression very briefly. It is that the height-deviate of the offspring is, on the average, two thirds of the height-deviate of its mid-parentage." [from :Regression towards mediocrity; (1885), p. 252]

1 comment:

Anonymous said...

Thanks for pointing out this study. It looks like a nice explanation of the limits of current genome-wide association studies.