So Much Promise, So Little Power
2001; Lippincott Williams & Wilkins; Volume: 13; Issue: 1 Linguagem: Inglês
10.1097/00008506-200101000-00001
ISSN1537-1921
Autores Tópico(s)Cardiac, Anesthesia and Surgical Outcomes
ResumoWe all know of the problem, but we seldom face such stark evidence as de Haan and coauthors have compiled in this issue of the Journal of Neurosurgical Anesthesiology(1). According to their Table 2 (page 5), only 6 out of 114 experiments that found pharmacological protection from spinal cord injury had acceptable statistical power to detect a 50% improvement in outcome. Put differently, only 1 in 20 had a > 0.8 a priori probability of finding a full-size car in a two-car garage (2). How did so many investigations with even less power, most of them with much less power, find so many needles in so many haystacks? Perhaps current methodology is so sensitive and specific that even minor beneficial effects are real and would be replicated by 19 out of 20 independent laboratories, as P < .05 implies. Or perhaps the world is awash in drugs that prevent or repair neurological damage, such that even samples of 6 can detect their striking ffects. But leaving Never-Never Land for a moment, we need to consider the possibility that too many researchers believe too strongly in their hypotheses before they test them. That last possibility gets my vote, not because I am particularly cynical or insightful, but because I have heard so many confessions. These admissions tend to be plain and up-front. In particular, when asked to give an overview of proposed research, believers start with background information that serves as a preamble to: “I want to show that . . .” or the more bullish, “I want to prove that . . .” followed by the statement of an hypothesis in which the confessor has more faith than he has in his putative religion. Research should start with a burning question, but too often it starts with a burning answer. This may be the underlying cause of underpowered studies–because it is easier for us to deceive ourselves about discrepancies between experimental results and preconceived truths if sample sizes are small. More specifically, it is easier to misperceive or disregard small amounts of data than it is to misperceive or disregard large amounts of data– and missing, mistaken, or manipulated data on two rats can only have a statistically significant effect if N's are small. The suggestion here is that scientists need to have a disease—call it Compulsive Empiricism, and to diagnose this qualifying disorder, we need to apply the Otero Test. Augustina (Caroline) Otero was, and may remain, the greatest courtesan that ever lived (3,4). She was not the sleazy kind of prostitute. She was not a lawyer, or even a politician. Otero's prostitution was the straightforward kind. She sold sex for money. Indeed, she sold about $20 million dollars worth between 1885 and 1915, back when that kind of money could have bought Alaska. And it was not the number of clients, rather, the nature of her clients, that led to both her success and downfall. The beginning of the end came on an evening in 1898 when King Leopold II of Belgium, Prince Nicholas I of Montenegro, Prince Albert of Monaco, and King Edward VII of England rented a private dining room in Monte Carlo to help Otero celebrate her 30th birthday. The night appeared to be ending early when the local host invited everyone over to his place. Unfortunately, Otero had never been in a casino and Prince Albert did a terrible thing–he made sure she won. By 1926 Otero had lost all of her money, all of her bequeathed tracts of royal land, all of her jewelry, and most of her clothes. Nevertheless, she lived until 1965 in a small Paris flat that had been left to her in perpetuity, along with a modest weekly stipend, by an anonymous American industrialist. In 1941 a reporter for a French tabloid got lucky. Otero had become a recluse, which enhanced her legendary status, but this fellow discovered where the legend bought her daily bread. After getting himself to the right place at the right time for 10 days running, Otero finally agreed to an interview. The reporter wanted something racy–a diversion for war-weary readers. You can imagine his disappointment when Otero answered the question “What was the greatest thrill of your life?” with “To win at gambling.” Then came Otero's now-famous punch line. In response to the witless follow-up question, “Well, what was the second greatest thrill of your life?” she replied without hesitation, “To lose at gambling.” So much for secrets about the sexual habits and private parts of robber barons and royals. Instead, Otero gave us the diagnostic criterion for a recognized compulsive disorder and an analogous disorder that we need to recognize and make contagious. We need to ask prospective investigators, including ourselves, “What if you find . . . [insert a description of data that would contradict the hypothesis being tested] . . .?” If the reaction is that the conjectured result will not happen, or cannot happen, or that if it happened it would indicate some flaw in the methodology, or that a context could be imagined whereby no contradiction would be implied, then the Otero test is negative. This researcher has Unshakable Conviction Syndrome (UCS), which excludes Compulsive Empiricism. Unfortunately, the will-do-good science prognosis for UCS is equilavent to the clinical prognosis for pancreatic cancer. But if the reaction to the proposition of contrary data is an almost perverse delight at the prospect of showing the hypothesis to be wrong, the Otero test is positive. For real gamblers, losing is second only to winning. For the Oteros of this world, the only real disaster is not being in the action. Likewise, for real scientists, showing oneself to be wrong must be second only to showing oneself to be right. When getting closer to the truth is the thrill, the only disaster is working hard to gather data that do not justify an inference. Underpowered investigations produce inconclusive results that are properly interpreted as inconclusive, inconclusive results that are improperly interpreted as negative, inconclusive results that are improperly interpreted as positive, or most rarely, true positive results that are properly inferred to be positive. The one thing that underpowered studies cannot do, by definition, is give an hypothesis a good chance of being shown to be wrong. That last attribute strikes too many believers as the perfect solution, but as Otero knew, you cannot really win if you cannot really lose. Imagine a person showing up at a Gamblers Anonymous meeting to confess a compulsion for gambling with play money. He would be hooted out of the room. We should have the same disdain for research that is underpowered by design. Eighteen years ago Drs. Cottrell, Griffin, and I suggested that scientific journals “require that an appendix containing all raw data be submitted with any article that contains a statistical inference–with the understanding that a copy of that appendix will be sent by the journal to any reader willing to cover copying and handling costs. The logic here is that raw data usually belie unjustified inferences. Authors would be more careful if they knew that anyone would be able to perform a truly independent analysis of their results”(2). In a private letter, Dr. Michenfelder, then Editor-in-Chief of Anesthesiology, agreed that this requirement would go a long way toward narrowing the credibility gap, but that the paperwork involved (filing, storing, retrieving, copying, mailing) would overwhelm editorial offices. Given a few facts and figures about submissions, the amount of paper that a file cabinet will hold, etcetera, we soon came to realize that Michenfelder was right–but that was before internet became a household word. Now as a backup to the Otero Test, we should strive toward a day when the first sentence of every Results section of every data-based paper will be a proforma statement to the effect that “All raw data can be found at http://www.nobull.edu.”
Referência(s)