I've just become a fan of William M. Briggs, statistician blogger. From global warming to current events, Dr. Briggs turns a painfully dull subject (for the rest of us) into entertaining lessons on the limits and pitfalls of statistics.
Take this for example.
September 6, 2008: Do not smooth time series, you hockey puck!
When we normal people holler about this, we look like presumptious malcontents. It's so much funnier when a PhD statistician says it.
Dr. Briggs comments:
"The various black lines are the actual data! The red-line is a 10-year running mean smoother! I will call the black data the real data, and I will call the smoothed data the fictional data. Mann used a “low pass filter” different than the running mean to produce his fictional data, but a smoother is a smoother and what I’m about to say changes not one whit depending on what smoother you use.
Now I’m going to tell you the great truth of time series analysis. Ready? Unless the data is measured with error, you never, ever, for no reason, under no threat, SMOOTH the series! And if for some bizarre reason you do smooth it, you absolutely on pain of death do NOT use the smoothed series as input for other analyses! If the data is measured with error, you might attempt to model it (which means smooth it) in an attempt to estimate the measurement error, but even in these rare cases you have to have an outside (the learned word is “exogenous”) estimate of that error, that is, one not based on your current data."
This ties into something that's been bugging me. In my search for the "elusive standard deviation" (of the global mean surface temperature), I kept coming across well-meaning folk telling me the standard deviation of the global mean temperature is a fraction of a degree Celsius, usually around 0.25° C. They say there isn't that much variability when you compare the means across the years. It is always going to hover very closely near the climatology.
Then it dawns on me they are treating the means themselves as raw measurements, like readings from a thermometer. Instead of seeing the means as statistical artifacts with a huge amount of uncertainty, they get a clean slate as absolute numbers with no error attached to them. If you take averages of averages of averages, you are going to end up with nice, tidy numbers with no variance at all. Yes, you can serially average, but each step of the series has to propagate the error from all previous averages. If you smooth the time series, over and over again, without propagating the error, you are going to end up with "fictional data" that has almost no variance and is all but certain.
I don't know about you, but I'd rather have "an uncertain truth" than a likely fiction.
Here are some of my favorite Briggs blogs.
November 12. 2008: Arcsine Climate Forecast
October 31, 2008: Breaking the Law of Averages: Probability and Statistics in Plain English
October 12, 2008: Peer Review Not Perfect: Shocking Finding
I'll close with my favorite quotation on statistics:
"There are three kinds of lies: lies, damn lies, and statistics."
-- Benjamin Disraeli, author, British statesman (1804-1881)