Here’s an interesting article from today’s New York Times penned by four professors at Wharton highlighting the ‘selection bias’ used by Roger Clemens’ camp in the recently released, 18,000 word document aimed at refuting claims “The Rocket” used steroids and human growth hormone: http://tinyurl.com/yvqgt4
Report Backing Clemens Chooses Its Facts Carefully
By ERIC BRADLOW, SHANE JENSEN, JUSTIN WOLFERS and ADI WYNER
There is no doubt that Clemens was a great pitcher, but the question is whether he was much better past 36 or 37 (when he is suspected of having taken performance-enhancing drugs) than would have been expected based on his early career.
A better approach to this problem involves comparing the career trajectories of all highly durable starting pitchers. We have analyzed the progress of Clemens as well as all 31 other pitchers since 1968 who started at least 10 games in at least 15 seasons, and pitched at least 3,000 innings. For two common pitching statistics, earned run average and walks-plus-hits per innings pitched, we fitted a smooth curve to all the data from these 31 pitchers and compared it with those for Clemens’s career.
Relative to this larger comparison group, Clemens’s second act is unusual. The other pitchers in this durable group usually improve steadily early in their careers, peaking at around age 30. Then a slow decline sets in as they reach their mid-30s.
Clemens follows a far different path. The arc of Clemens’s career is upside down: his performance declines as he enters his late 20s and improves into his mid-30s and 40s.
The report correctly observes that he is not the only pitcher to excel at a comparatively old age, but it fails to note that he has taken an unusual path to that late-career success.
Another key shortcoming of the Clemens report is that it focuses almost exclusively on his E.R.A. But a pitcher’s E.R.A. is affected by factors, like defense, that have nothing to do with his pitching. It is also affected by other factors, like the order of events – a triple, for instance, can be hit with the bases empty, or the bases loaded. So a pitcher’s E.R.A. tends to bounce around a lot, and these ups and downs can help obscure patterns in career numbers.
Because E.R.A. can be so unreliable, analysts prefer to look at basic building blocks of talent like strikeout, walk, hit and home run rates. Clemens’s walks-plus-hits rate, for instance, follows an even more unusual trajectory late in his career, one that raises some suspicion.
Other measures suggest Clemens performed similarly to his contemporaries. But these comparisons do not provide evidence of his innocence; they simply fail to provide evidence of his guilt.
Our reading is that the available data on Clemens’s career strongly hint that some unusual factors may have been at play in producing his excellent late-career statistics.
In any analysis of his career statistics, it is impossible to say whether this unusual factor was performance-enhancing drugs.
The Clemens report argues that his longevity “was due to his ability to adjust his style of pitching as he got older, incorporating his very effective split-finger fastball to offset the decrease in the speed of his regular fastball caused by aging.” While this may be true, it is also just speculation: there is not a single number in the report quantifying the evolution of Clemens’s pitch selection.
Statistics provide powerful tools for understanding the world around us, but the value of any analysis invariably comes down to choosing a useful statistic and an appropriate comparison group. Statisticians-for-hire have a tendency to choose comparison groups that support their clients. A careful analysis, and a better informed public, are the best defense against such smoke and mirrors.