Polling to get the answer you want
This clip from the old British comedy, Yes, Prime Minister illustrates polling methodology:
[H/t AmericaBlog.]
January 1st, 2010
This clip from the old British comedy, Yes, Prime Minister illustrates polling methodology:
[H/t AmericaBlog.]
January 1st, 2010
I have come across an important article from the British Medical Journal that discusses one of the major gaps in what has come to be called Evidence-Based Medicine. Such a gap is a major scandal for the field of medicine:
Parachute use to prevent death and major trauma related to gravitational challenge: Systematic review of randomised controlled trials
Gordon C S Smith, professor1, Jill P Pell, consultant2
1 Department of Obstetrics and Gynaecology, Cambridge University, Cambridge CB2 2QQ, 2 Department of Public Health, Greater Glasgow NHS Board, Glasgow G3 8YU
Correspondence to: G C S Smith gcss2@cam.ac.uk
Abstract
Objectives To determine whether parachutes are effective in preventing major trauma related to gravitational challenge.
Design Systematic review of randomised controlled trials.
Data sources: Medline, Web of Science, Embase, and the Cochrane Library databases; appropriate internet sites and citation lists.
Study selection: Studies showing the effects of using a parachute during free fall.
Main outcome measure Death or major trauma, defined as an injury severity score > 15.
Results We were unable to identify any randomised controlled trials of parachute intervention.Conclusions As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.
November 25th, 2009
Earlier today I posted an article on changes in publication patterns in response to new rules by the Journal of the American Medical Association requiring independent analysis of drug company data before publication in JAMA. That article also referred to problems with the national registry of clinical trials. A friend sent me the following article on these problems, which cast serious doubt on the quality of our knowledge on drug efficacy:
Quality and completeness of medical literature questioned in two new studies
By Michael O’Riordan
New York, NY – Two recently published studies cast some unfavorable light on the current quality and completeness of medical literature, with one showing that less than half of registered studies are published in medical journals [1], and the other showing questionable discrepancies between the registered and reported clinical outcomes [2].
Dr Joseph Ross (Mount Sinai School of Medicine, New York), the lead investigator of the study showing that just 46% of studies registered on the National Institutes of Health (NIH)-funded website ClinicalTrials.gov ever make it as a published paper, told heartwire that the findings are very alarming.
“The research should send shockwaves through the research community, as it shows us that while it’s all well and good to practice evidence-based medicine, we don’t have all the evidence,” said Ross. “In terms of following guidelines and understanding the right treatment approach, we actually don’t have all the evidence at hand to make those decisions. This is a really shaky foundation. On top of that, the stuff that is being published might not even reflect the studies as they were designed.”
Dr Harlan Krumholz (Yale University School of Medicine, New Haven, CT), the senior author of the paper with Ross, commented to heartwire that the studies, in tandem, reveal significant shortcomings of the data available in the national registry of randomized trials, with the implication being that drug assessment is “hampered, even undermined” by the incomplete picture.
The study by Ross, Krumholz, and colleagues is published in the September 9, 2009 issue of PLoS Medicine, while the second study comparing the registered and published outcomes in clinical trials, which is led by Dr Sylvain Mathieu (Hôpital Bichat-Claude Bernard, Paris, France), is published in the September 2, 2009 issue of the Journal of the American Medical Association.
What’s required and optional with ClinicalTrials.gov?
In 2005, the International Committee of Medical Journal Editors (ICMJE) began a policy stating that information about a clinical trial needed to be registered before patient enrollment as a precondition for publication. ClinicalTrials.gov, a registry that has seen an average of 220 trials registered each week since 2005, requires mandatory information from investigators, with other information considered optional.
Mandatory reporting includes study title, summary, design, phase, type, conditions or focus of study, intervention, eligibility criteria, gender, minimum/maximum age, recruitment status, sponsor, facility, study official or facility contact, central contact, oversight authorities, and institutional review board approval, as well as other administrative details.
Optional information includes the primary purpose for the study, start date, completion data, enrollment target number, primary and secondary outcomes, whether the trial accepts healthy volunteers, and FDA product status.
Trial completed but never published
In the study by Ross et al, the researchers examined the reporting of registration information among a cross-section of clinical trials registered at ClinicalTrials.gov after December 31, 1999 and updated as completed by June 2007. The study looked at the trials registered and the completeness of the mandatory and optional reporting fields on the NIH website and the extent to which these registered trials were published.Excluding phase 1 trials, the group identified 7515 completed trials over the seven-year period, and of these, nearly 100% reported information mandated by ClinicalTrials.gov, including the study intervention and sponsorship. Ross noted, however, that some trials were more complete than others, with some providing only vague or nondescript information. Optional data, however, fared worse. Just 53% of studies reported the study’s end date, only 66% reported the primary outcome, and 87% reported the trial start date.
“About half to two-thirds of the time, the information that was supposed to be there was there, or could have been there,” said Ross. “What we discovered as we looked through the second step at publication was this optional information is really important when you’re trying to figure out what’s been published and what hasn’t.”
To assess how many of the registered trials were published, the group examined 10% of the completed trials as a subsample. Of these trials, just 46% of studies were published, and of these, only 31% provided a citation in ClinicalTrials.gov of that publication. Trials sponsored by industry fared the worst, with just 40% of registered trials published, which was significantly less than the 56% of published trials not sponsored by government or industry, such as a university or foundation. Of studies sponsored by government, 47% were published.
“For the most part, everybody’s publication rates were disappointingly low,” said Ross. “It’s really hard to know why. On the one hand, it seems like zillions of papers are published every day, and it should be easy to get something published. But once a paper has been rejected a few times, it’s sort of easy to lose steam. Investigators and companies are often chasing the next grant for the next study by the time the push comes to get something published.”
Ross said the results are in line with literature showing that studies sponsored by industry are more likely to be positive. Because their publication rates are low, it is likely that many negative trials aren’t being published. He noted that it is also possible that they are conducting trials for purposes other than research, such as trying to get people to adhere to their medication, and while it is registered as a trial, it is not considered for publication.
Differing clinical end points
In the second study, by Mathieu and colleagues, the group compared the primary outcomes specified in ClinicalTrials.gov with the primary outcomes reported in 10 high-impact medical journals, including Circulation and the Journal of the American College of Cardiology.They obtained information on 323 randomized, controlled, clinical trials in cardiology, gastroenterology, and rheumatology. Of these trials, just 45.5% were adequately registered, meaning they were registered before the end of the trial and the primary outcome was clearly specified. Among these trials registered adequately, investigators say that there were discrepancies between outcomes registered and outcomes reported in 31% of the published papers. Almost 83% of the differences in the primary outcomes favored reporting statistically significant results, according to Mathieu and colleagues.
Overall, more than one-quarter of the published studies were unregistered, 14% were registered after the study was finished, and 11% were registered with no description or an unclear description of the primary outcome.
“A main goal of trial registration is to enhance transparency of research and accountability in the planning, conduct, and reporting of clinical trials, an objective achieved by making available details about the trial,” write Mathieu and colleagues. “Therefore, adequate registration should be a safeguard against publication bias. A major step has been achieved with the ICJME initiative for trial registration, and the existence of all trials is now publicly available. However, after this first step, the quality and timing of registration still needs improvement.”
Speaking with heartwire, Ross said that he believes registries like ClinicalTrials.gov are excellent ideas and that the research profession is headed in the right direction by making the medical field more transparent. However, some changes are needed to make it more effective, including making more information mandatory, including specific details about primary and secondary end points, and better oversight.
“There aren’t a lot of resources being devoted to oversight right now,” he said. “Who’s going to be responsible for catching up with companies and investigators who are registering trials and not reporting or not filling in all the needed fields?”
************
Ross and Krumholz were previously consultants for the plaintiffs in litigation against Merck related to rofecoxib. Krumholz has research contracts with the American College of Cardiology and the Colorado Foundation for Medical Care, has served on the advisory boards of Amgen and UnitedHealthCare, and is the academic editor-in-chief of Circulation: Cardiovascular Quality and Outcomes, and Journal Watch Cardiology.
September 14th, 2009
The Chronicle of Higher Ed has an interesting article on changes in publication patterns when the Journal of the American Medical Association started requiring independent analysis of drug company data before publication in JAMA. Drug company submissions to JAMA dropped 21%, while their submissions to rival journals increased. The article also discusses problems with other approaches to drug company bias of research results, such as a clinical trials registry.
These and similar studies suggest that we face a major problem in that these sorts of biases increase concerns about the validity of much medical “knowledge” is questionable. Research is already plagued with many aspects of its social organization that bias results. For example, the tendency to be more willing to publish positive rather than negative results. As a researcher and as a frequent peer-reviewer of research, I am well aware of these problems. the various ways in which industry funding distorts research, while not the only problems in the research-publication process, significantly compound the problem.
Medical Journals See a Cost to Fighting Industry-Backed Research
By Paul BaskenThe Journal of the American Medical Association saw a 21 percent drop in industry-financed research after it began requiring that data in company-sponsored medical trials be independently verified by university researchers, a study has concluded.
The study, by a team of medical researchers in England and Florida, found that two of JAMA’s competitors saw their proportions of industry-backed research grow after JAMA decided to impose the requirement in 2005 to deter companies from shading descriptions of medical-test results to favor their products.
The findings suggest JAMA could face significant financial pressure to abandon the policy, given the reliance of medical journals on corporate dollars, said one of the study’s authors, Benjamin Djulbegovic, a professor of medicine and oncology at the University of South Florida.
“Major medical journals face an inherent conflict of interest” when trying to ensure the integrity of their published findings, Dr. Djulbegovic said in presenting the findings at the International Congress on Peer Review and Biomedical Publication here, a quadrennial conference of medical-journal publishers organized by JAMA with support from several of the other journals.
The three-day conference was dominated by investigations of the ways that corporate money is believed to be misleading both the public and medical professionals who rely on the journals for impartial evaluations of the safety and effectiveness of drugs and medical procedures.
Such warnings have been common features of the JAMA-initiated conferences, which began in 1989, though the issue took center stage in Vancouver after another year of allegations of corporate distortions of medical-research findings.
In the past few months, lawyers suing drug makers alleging harmful effects of their medications have found evidence of concerted attempts by the companies to secretly influence the presentation of medical-journal articles that appeared to have been written by independent university scientists.
Dozens of universities have meanwhile seen the need to toughen requirements on their researchers to disclose details of their financial relationships with makers of pharmaceuticals and medical devices. The federal government has also formed a registry where researchers are encouraged to describe their studies in advance so that any published conclusions can be compared with the promised objective.
‘Lots of Warts’
Despite those efforts, the studies presented to the journal editors gathered here covered a range of ways in which articles in their pages may still contain inaccuracies, often resulting from a financial conflict involving a scientist or a reviewer.
Such concerns led some conference participants to question the journals’ financial models: They rely on unpaid volunteers to review article submissions and on revenue from companies that buy reprints of articles that depict their products favorably.
Conference topics included the failure of journals and their authors to disclose corporate connections, the reluctance of researchers to share their data, the use of misleading rhetoric in journal articles, and the almost uniform ability of authors rejected by one journal to get published in another.
“We still have lots of warts,” Catherine D. DeAngelis, editor of JAMA, said of her industry after listening to the presentations.
Even some areas of improvement were shown to have their limits. About 300 journals have now joined a commitment by JAMA and other leading journals to publish only research in which the authors registered their intended outcomes in advance. The system, using either the federal registry or a recognized alternative, is designed to guard against researchers’ using their studies to selectively identify data that support a drug or treatment rather than sticking to the criteria they initially promised to measure.
But studies presented in Vancouver showed that the registry system isn’t yet having a significant effect because too many researchers are making registry entries that are either vague or filled with too many measurement criteria. “Registration alone cannot improve research quality,” Deborah A. Zarin, director of the federal registry, told the conference.
Independent Data Reviews
JAMA also found its competitors still unwilling to join its commitment to publish industry-supported studies only if the data get an independent review.
Dr. Djulbegovic and his colleagues at the Committee on Publication Ethics, in Britain, compared JAMA’s experience under its 2005 policy with that of two other leading medical journals. They tallied the portions of research appearing in the journals that involve industry financing, comparing numbers from 2002 and 2008.
JAMA saw the percentage of industry-supported studies in its pages drop 21 percent, from more than 60 percent of its published trials to 47 percent. Lancet, however, saw a growth of 17 percent, and The New England Journal of Medicine had an increase of 11 percent, the group reported.
Dr. Djulbegovic called the finding “quite dramatic,” while acknowledging that his investigation had a number of limitations, including the fact that it did not demonstrate the degree to which the shift could be attributed to the 2005 policy.
The study also didn’t show whether JAMA’s policy is actually producing more reliable research, said Fiona Godlee, editor in chief of BMJ, another leading medical journal. Dr. Godlee told Dr. Djulbegovic that other journals “would be flocking to” join JAMA if anyone could show the policy produced better science.
‘Something to Hide’
Dr. DeAngelis, JAMA’s editor, followed Dr. Godlee with an impassioned pledge to stick with the policy, saying she was pushed into imposing the requirement after at least two instances in which a corporate sponsor refused to allow an outside review of its data before publication.
The policy, she said, means she will always have the ability to call a university dean and ask for an investigation any time she encounters a challenge to data published in JAMA. And while some companies may be boycotting JAMA, the journal hasn’t seen its ad revenue drop any more than its competitors have during the recession, and its “impact factor”—a measure of its authors’ influence—has grown since 2005, she said.
“Until somebody can prove that what we’re doing is wrong, we’re going to keep it.” Dr. DeAngelis told her fellow editors. “The cynic in me says, If you’re not submitting to JAMA because you have something to hide or you don’t want anybody else to look at it, so be it. ”
JAMA, meanwhile, had its own issues with data accuracy at the conference. Dr. DeAngelis and other JAMA editors presented a survey of authors conducted last year that concluded, according to the summary given to conference participants, that the prevalence of “ghostwriting”—in which university scientists sign their names to research articles that secretly originated with writers paid by companies—has grown significantly since their previous survey in 1996.
But in the actual presentation to the conference, a JAMA researcher, Joseph S. Wislar, said his subsequent analysis of the data showed no significant increase in ghostwriting during that period. Nevertheless, the prevalence of “ghost” authors in top-ranked medical journals remains a concern, Mr. Wislar said, ranging last year from 2 percent at Nature Medicine to 11 percent at The New England Journal of Medicine.
Industry Silence
The conference participants included representatives of several of the drug companies, who largely sat silently through the repeated depiction of their industry as an obstacle to the unbiased pursuit of medical research.
Many of the concerns raised at the conference appeared to reflect industry tactics that may have been practiced by some companies a decade or more ago but aren’t common now, said Fran Young, director of science publications at Shire Pharmaceuticals. “Things have changed,” Ms. Young said after watching the presentations.
“Either I have worked at the three most rigorous companies out there,” said Ms. Young, formerly with AstraZeneca and GlaxoSmithKline, “or things are not as bad as being painted in this room.”
The companies and the journals appear to share concern over some of the problems identified at the conference, Ms. Young said. Those areas include the bias, as described by some conference presenters, in favor of publishing test results that show a specific result of a medication or procedure as opposed to “negative” results that show no significant effect of the treatment. Either positive or negative findings can contain critical information, yet data presented at the conference suggest that the positive result is more likely to get published.
And any resulting bias in journal articles may not be the fault of drug companies alone. During one discussion of the reluctance of medical journals to publish negative results, Faina Linkov, a research assistant professor at the University of Pittsburgh, said it was a well-known problem.
“All of my colleagues and I,” she said, “are very much tempted to massage the data until we find some positive results.”
September 14th, 2009
Wired has a fascinating article on a major problem bedeviling the pharmaceutical companies. Their drugs are faring less well than they used to against placebo. It is sad how little of the article is devoted to ways to therapeutically harness the placebo effect. These applications, alas, are not likely to make money for the drug companies:
Placebos Are Getting More Effective. Drugmakers Are Desperate to Know Why
By Steve Silberman
Merck was in trouble. In 2002, the pharmaceutical giant was falling behind its rivals in sales. Even worse, patents on five blockbuster drugs were about to expire, which would allow cheaper generics to flood the market. The company hadn’t introduced a truly new product in three years, and its stock price was plummeting.
In interviews with the press, Edward Scolnick, Merck’s research director, laid out his battle plan to restore the firm to preeminence. Key to his strategy was expanding the company’s reach into the antidepressant market, where Merck had lagged while competitors like Pfizer and GlaxoSmithKline created some of the best-selling drugs in the world. “To remain dominant in the future,” he told Forbes, “we need to dominate the central nervous system.”
His plan hinged on the success of an experimental antidepressant codenamed MK-869. Still in clinical trials, it looked like every pharma executive’s dream: a new kind of medication that exploited brain chemistry in innovative ways to promote feelings of well-being. The drug tested brilliantly early on, with minimal side effects, and Merck touted its game-changing potential at a meeting of 300 securities analysts.
Behind the scenes, however, MK-869 was starting to unravel. True, many test subjects treated with the medication felt their hopelessness and anxiety lift. But so did nearly the same number who took a placebo, a look-alike pill made of milk sugar or another inert substance given to groups of volunteers in clinical trials to gauge how much more effective the real drug is by comparison. The fact that taking a faux drug can powerfully improve some people’s health—the so-called placebo effect—has long been considered an embarrassment to the serious practice of pharmacology.
Ultimately, Merck’s foray into the antidepressant market failed. In subsequent tests, MK-869 turned out to be no more effective than a placebo. In the jargon of the industry, the trials crossed the futility boundary.
MK-869 wasn’t the only highly anticipated medical breakthrough to be undone in recent years by the placebo effect. From 2001 to 2006, the percentage of new products cut from development after Phase II clinical trials, when drugs are first tested against placebo, rose by 20 percent. The failure rate in more extensive Phase III trials increased by 11 percent, mainly due to surprisingly poor showings against placebo. Despite historic levels of industry investment in R&D, the US Food and Drug Administration approved only 19 first-of-their-kind remedies in 2007—the fewest since 1983—and just 24 in 2008. Half of all drugs that fail in late-stage trials drop out of the pipeline due to their inability to beat sugar pills.
The upshot is fewer new medicines available to ailing patients and more financial woes for the beleaguered pharmaceutical industry. Last November, a new type of gene therapy for Parkinson’s disease, championed by the Michael J. Fox Foundation, was abruptly withdrawn from Phase II trials after unexpectedly tanking against placebo. A stem-cell startup called Osiris Therapeutics got a drubbing on Wall Street in March, when it suspended trials of its pill for Crohn’s disease, an intestinal ailment, citing an “unusually high” response to placebo. Two days later, Eli Lilly broke off testing of a much-touted new drug for schizophrenia when volunteers showed double the expected level of placebo response.
It’s not only trials of new drugs that are crossing the futility boundary. Some products that have been on the market for decades, like Prozac, are faltering in more recent follow-up tests. In many cases, these are the compounds that, in the late ’90s, made Big Pharma more profitable than Big Oil. But if these same drugs were vetted now, the FDA might not approve some of them. Two comprehensive analyses of antidepressant trials have uncovered a dramatic increase in placebo response since the 1980s. One estimated that the so-called effect size (a measure of statistical significance) in placebo groups had nearly doubled over that time.
It’s not that the old meds are getting weaker, drug developers say. It’s as if the placebo effect is somehow getting stronger.
The fact that an increasing number of medications are unable to beat sugar pills has thrown the industry into crisis. The stakes could hardly be higher. In today’s economy, the fate of a long-established company can hang on the outcome of a handful of tests.
Why are inert pills suddenly overwhelming promising new drugs and established medicines alike? The reasons are only just beginning to be understood. A network of independent researchers is doggedly uncovering the inner workings—and potential therapeutic applications—of the placebo effect. At the same time, drugmakers are realizing they need to fully understand the mechanisms behind it so they can design trials that differentiate more clearly between the beneficial effects of their products and the body’s innate ability to heal itself. A special task force of the Foundation for the National Institutes of Health is seeking to stem the crisis by quietly undertaking one of the most ambitious data-sharing efforts in the history of the drug industry. After decades in the jungles of fringe science, the placebo effect has become the elephant in the boardroom.
The roots of the placebo problem can be traced to a lie told by an Army nurse during World War II as Allied forces stormed the beaches of southern Italy. The nurse was assisting an anesthetist named Henry Beecher, who was tending to US troops under heavy German bombardment. When the morphine supply ran low, the nurse assured a wounded soldier that he was getting a shot of potent painkiller, though her syringe contained only salt water. Amazingly, the bogus injection relieved the soldier’s agony and prevented the onset of shock.
Returning to his post at Harvard after the war, Beecher became one of the nation’s leading medical reformers. Inspired by the nurse’s healing act of deception, he launched a crusade to promote a method of testing new medicines to find out whether they were truly effective. At the time, the process for vetting drugs was sloppy at best: Pharmaceutical companies would simply dose volunteers with an experimental agent until the side effects swamped the presumed benefits. Beecher proposed that if test subjects could be compared to a group that received a placebo, health officials would finally have an impartial way to determine whether a medicine was actually responsible for making a patient better.
In a 1955 paper titled “The Powerful Placebo,” published in The Journal of the American Medical Association, Beecher described how the placebo effect had undermined the results of more than a dozen trials by causing improvement that was mistakenly attributed to the drugs being tested. He demonstrated that trial volunteers who got real medication were also subject to placebo effects; the act of taking a pill was itself somehow therapeutic, boosting the curative power of the medicine. Only by subtracting the improvement in a placebo control group could the actual value of the drug be calculated.
The article caused a sensation. By 1962, reeling from news of birth defects caused by a drug called thalidomide, Congress amended the Food, Drug, and Cosmetic Act, requiring trials to include enhanced safety testing and placebo control groups. Volunteers would be assigned randomly to receive either medicine or a sugar pill, and neither doctor nor patient would know the difference until the trial was over. Beecher’s double-blind, placebo-controlled, randomized clinical trial—or RCT—was enshrined as the gold standard of the emerging pharmaceutical industry. Today, to win FDA approval, a new medication must beat placebo in at least two authenticated trials.
Beecher’s prescription helped cure the medical establishment of outright quackery, but it had an insidious side effect. By casting placebo as the villain in RCTs, he ended up stigmatizing one of his most important discoveries. The fact that even dummy capsules can kick-start the body’s recovery engine became a problem for drug developers to overcome, rather than a phenomenon that could guide doctors toward a better understanding of the healing process and how to drive it most effectively.
In his eagerness to promote his template for clinical trials, Beecher also overreached by seeing the placebo effect at work in curing ailments like the common cold, which wane with no intervention at all. But the triumph of Beecher’s gold standard was a generation of safer medications that worked for nearly everyone. Anthracyclines don’t require an oncologist with a genial bedside manner to slow the growth of tumors.
What Beecher didn’t foresee, however, was the explosive growth of the pharmaceutical industry. The blockbuster success of mood drugs in the ’80s and ’90s emboldened Big Pharma to promote remedies for a growing panoply of disorders that are intimately related to higher brain function. By attempting to dominate the central nervous system, Big Pharma gambled its future on treating ailments that have turned out to be particularly susceptible to the placebo effect.
The tall, rusty-haired son of a country doctor, William Potter, 64, has spent most of his life treating mental illness—first as a psychiatrist at the National Institute of Mental Health and then as a drug developer. A decade ago, he took a job at Lilly’s neuroscience labs. There, working on new antidepressants and antianxiety meds, he became one of the first researchers to glimpse the approaching storm.
To test products internally, pharmaceutical companies routinely run trials in which a long-established medication and an experimental one compete against each other as well as against a placebo. As head of Lilly’s early-stage psychiatric drug development in the late ’90s, Potter saw that even durable warhorses like Prozac, which had been on the market for years, were being overtaken by dummy pills in more recent tests. The company’s next-generation antidepressants were faring badly, too, doing no better than placebo in seven out of 10 trials.
As a psychiatrist, Potter knew that some patients really do seem to get healthier for reasons that have more to do with a doctor’s empathy than with the contents of a pill. But it baffled him that drugs he’d been prescribing for years seemed to be struggling to prove their effectiveness. Thinking that something crucial may have been overlooked, Potter tapped an IT geek named David DeBrota to help him comb through the Lilly database of published and unpublished trials—including those that the company had kept secret because of high placebo response. They aggregated the findings from decades of antidepressant trials, looking for patterns and trying to see what was changing over time. What they found challenged some of the industry’s basic assumptions about its drug-vetting process.
Assumption number one was that if a trial were managed correctly, a medication would perform as well or badly in a Phoenix hospital as in a Bangalore clinic. Potter discovered, however, that geographic location alone could determine whether a drug bested placebo or crossed the futility boundary. By the late ’90s, for example, the classic antianxiety drug diazepam (also known as Valium) was still beating placebo in France and Belgium. But when the drug was tested in the US, it was likely to fail. Conversely, Prozac performed better in America than it did in western Europe and South Africa. It was an unsettling prospect: FDA approval could hinge on where the company chose to conduct a trial.
Mistaken assumption number two was that the standard tests used to gauge volunteers’ improvement in trials yielded consistent results. Potter and his colleagues discovered that ratings by trial observers varied significantly from one testing site to another. It was like finding out that the judges in a tight race each had a different idea about the placement of the finish line.
Potter and DeBrota’s data-mining also revealed that even superbly managed trials were subject to runaway placebo effects. But exactly why any of this was happening remained elusive. “We were able to identify many of the core issues in play,” Potter says. “But there was no clear answer to the problem.” Convinced that what Lilly was facing was too complex for any one pharmaceutical house to unravel on its own, he came up with a plan to break down the firewalls between researchers across the industry, enabling them to share data in “pre-competitive space.”
After prodding by Potter and others, the NIH focused on the issue in 2000, hosting a three-day conference in Washington. For the first time in medical history, more than 500 drug developers, doctors, academics, and trial designers put their heads together to examine the role of the placebo effect in clinical trials and healing in general.
Potter’s ambitious plan for a collaborative approach to the problem eventually ran into its own futility boundary: No one would pay for it. And drug companies don’t share data, they hoard it. But the NIH conference launched a new wave of placebo research in academic labs in the US and Italy that would make significant progress toward solving the mystery of what was happening in clinical trials.
Visitors to Fabrizio Benedetti’s clinic at the University of Turin are asked never to say the P-word around the med students who sign up for his experiments. For all the volunteers know, the trim, soft-spoken neuroscientist is hard at work concocting analgesic skin creams and methods for enhancing athletic performance.
One recent afternoon in his lab, a young soccer player grimaced with exertion while doing leg curls on a weight machine. Benedetti and his colleagues were exploring the potential of using Pavlovian conditioning to give athletes a competitive edge undetectable by anti-doping authorities. A player would receive doses of a performance-enhancing drug for weeks and then a jolt of placebo just before competition.
Benedetti, 53, first became interested in placebos in the mid-’90s, while researching pain. He was surprised that some of the test subjects in his placebo groups seemed to suffer less than those on active drugs. But scientific interest in this phenomenon, and the money to research it, were hard to come by. “The placebo effect was considered little more than a nuisance,” he recalls. “Drug companies, physicians, and clinicians were not interested in understanding its mechanisms. They were concerned only with figuring out whether their drugs worked better.”
Part of the problem was that response to placebo was considered a psychological trait related to neurosis and gullibility rather than a physiological phenomenon that could be scrutinized in the lab and manipulated for therapeutic benefit. But then Benedetti came across a study, done years earlier, that suggested the placebo effect had a neurological foundation. US scientists had found that a drug called naloxone blocks the pain-relieving power of placebo treatments. The brain produces its own analgesic compounds called opioids, released under conditions of stress, and naloxone blocks the action of these natural painkillers and their synthetic analogs. The study gave Benedetti the lead he needed to pursue his own research while running small clinical trials for drug companies.
Now, after 15 years of experimentation, he has succeeded in mapping many of the biochemical reactions responsible for the placebo effect, uncovering a broad repertoire of self-healing responses. Placebo-activated opioids, for example, not only relieve pain; they also modulate heart rate and respiration. The neurotransmitter dopamine, when released by placebo treatment, helps improve motor function in Parkinson’s patients. Mechanisms like these can elevate mood, sharpen cognitive ability, alleviate digestive disorders, relieve insomnia, and limit the secretion of stress-related hormones like insulin and cortisol.
In one study, Benedetti found that Alzheimer’s patients with impaired cognitive function get less pain relief from analgesic drugs than normal volunteers do. Using advanced methods of EEG analysis, he discovered that the connections between the patients’ prefrontal lobes and their opioid systems had been damaged. Healthy volunteers feel the benefit of medication plus a placebo boost. Patients who are unable to formulate ideas about the future because of cortical deficits, however, feel only the effect of the drug itself. The experiment suggests that because Alzheimer’s patients don’t get the benefits of anticipating the treatment, they require higher doses of painkillers to experience normal levels of relief.
Benedetti often uses the phrase “placebo response” instead of placebo effect. By definition, inert pills have no effect, but under the right conditions they can act as a catalyst for what he calls the body’s “endogenous health care system.” Like any other internal network, the placebo response has limits. It can ease the discomfort of chemotherapy, but it won’t stop the growth of tumors. It also works in reverse to produce the placebo’s evil twin, the nocebo effect. For example, men taking a commonly prescribed prostate drug who were informed that the medication may cause sexual dysfunction were twice as likely to become impotent.
Further research by Benedetti and others showed that the promise of treatment activates areas of the brain involved in weighing the significance of events and the seriousness of threats. “If a fire alarm goes off and you see smoke, you know something bad is going to happen and you get ready to escape,” explains Tor Wager, a neuroscientist at Columbia University. “Expectations about pain and pain relief work in a similar way. Placebo treatments tap into this system and orchestrate the responses in your brain and body accordingly.”
In other words, one way that placebo aids recovery is by hacking the mind’s ability to predict the future. We are constantly parsing the reactions of those around us—such as the tone a doctor uses to deliver a diagnosis—to generate more-accurate estimations of our fate. One of the most powerful placebogenic triggers is watching someone else experience the benefits of an alleged drug. Researchers call these social aspects of medicine the therapeutic ritual.
In a study last year, Harvard Medical School researcher Ted Kaptchuk devised a clever strategy for testing his volunteers’ response to varying levels of therapeutic ritual. The study focused on irritable bowel syndrome, a painful disorder that costs more than $40 billion a year worldwide to treat. First the volunteers were placed randomly in one of three groups. One group was simply put on a waiting list; researchers know that some patients get better just because they sign up for a trial. Another group received placebo treatment from a clinician who declined to engage in small talk. Volunteers in the third group got the same sham treatment from a clinician who asked them questions about symptoms, outlined the causes of IBS, and displayed optimism about their condition.
Not surprisingly, the health of those in the third group improved most. In fact, just by participating in the trial, volunteers in this high-interaction group got as much relief as did people taking the two leading prescription drugs for IBS. And the benefits of their bogus treatment persisted for weeks afterward, contrary to the belief—widespread in the pharmaceutical industry—that the placebo response is short-lived.
Studies like this open the door to hybrid treatment strategies that exploit the placebo effect to make real drugs safer and more effective. Cancer patients undergoing rounds of chemotherapy often suffer from debilitating nocebo effects—such as anticipatory nausea—conditioned by their past experiences with the drugs. A team of German researchers has shown that these associations can be unlearned through the administration of placebo, making chemo easier to bear.
Meanwhile, the classic use of placebos in medicine—to boost the confidence of anxious patients—has been employed tacitly for ages. Nearly half of the doctors polled in a 2007 survey in Chicago admitted to prescribing medications they knew were ineffective for a patient’s condition—or prescribing effective drugs in doses too low to produce actual benefit—in order to provoke a placebo response.
The main objections to more widespread placebo use in clinical practice are ethical, but the solutions to these conundrums can be surprisingly simple. Investigators told volunteers in one placebo study that the pills they were taking were “known to significantly reduce pain in some patients.” The researchers weren’t lying.
These new findings tell us that the body’s response to certain types of medication is in constant flux, affected by expectations of treatment, conditioning, beliefs, and social cues.
For instance, the geographic variations in trial outcome that Potter uncovered begin to make sense in light of discoveries that the placebo response is highly sensitive to cultural differences. Anthropologist Daniel Moerman found that Germans are high placebo reactors in trials of ulcer drugs but low in trials of drugs for hypertension—an undertreated condition in Germany, where many people pop pills for herzinsuffizienz, or low blood pressure. Moreover, a pill’s shape, size, branding, and price all influence its effects on the body. Soothing blue capsules make more effective tranquilizers than angry red ones, except among Italian men, for whom the color blue is associated with their national soccer team—Forza Azzurri!
But why would the placebo effect seem to be getting stronger worldwide? Part of the answer may be found in the drug industry’s own success in marketing its products.
Potential trial volunteers in the US have been deluged with ads for prescription medications since 1997, when the FDA amended its policy on direct-to-consumer advertising. The secret of running an effective campaign, Saatchi & Saatchi’s Jim Joseph told a trade journal last year, is associating a particular brand-name medication with other aspects of life that promote peace of mind: “Is it time with your children? Is it a good book curled up on the couch? Is it your favorite television show? Is it a little purple pill that helps you get rid of acid reflux?” By evoking such uplifting associations, researchers say, the ads set up the kind of expectations that induce a formidable placebo response.
The success of those ads in selling blockbuster drugs like antidepressants and statins also pushed trials offshore as therapeutic virgins—potential volunteers who were not already medicated with one or another drug—became harder to find. The contractors that manage trials for Big Pharma have moved aggressively into Africa, India, China, and the former Soviet Union. In these places, however, cultural dynamics can boost the placebo response in other ways. Doctors in these countries are paid to fill up trial rosters quickly, which may motivate them to recruit patients with milder forms of illness that yield more readily to placebo treatment. Furthermore, a patient’s hope of getting better and expectation of expert care—the primary placebo triggers in the brain—are particularly acute in societies where volunteers are clamoring to gain access to the most basic forms of medicine. “The quality of care that placebo patients get in trials is far superior to the best insurance you get in America,” says psychiatrist Arif Khan, principal investigator in hundreds of trials for companies like Pfizer and Bristol-Myers Squibb. “It’s basically luxury care.”
Big Pharma faces additional problems in beating placebo when it comes to psychiatric drugs. One is to accurately define the nature of mental illness. The litmus test of drug efficacy in antidepressant trials is a questionnaire called the Hamilton Depression Rating Scale. The HAM-D was created nearly 50 years ago based on a study of major depressive disorder in patients confined to asylums. Few trial volunteers now suffer from that level of illness. In fact, many experts are starting to wonder if what drug companies now call depression is even the same disease that the HAM-D was designed to diagnose.
Existing tests also may not be appropriate for diagnosing disorders like social anxiety and premenstrual dysphoria—the very types of chronic, fuzzily defined conditions that the drug industry started targeting in the ’90s, when the placebo problem began escalating. The neurological foundation of these illnesses is still being debated, making it even harder for drug companies to come up with effective treatments.
What all of these disorders have in common, however, is that they engage the higher cortical centers that generate beliefs and expectations, interpret social cues, and anticipate rewards. So do chronic pain, sexual dysfunction, Parkinson’s, and many other ailments that respond robustly to placebo treatment. To avoid investing in failure, researchers say, pharmaceutical companies will need to adopt new ways of vetting drugs that route around the brain’s own centralized network for healing.
Ten years and billions of R&D dollars after William Potter first sounded the alarm about the placebo effect, his message has finally gotten through. In the spring, Potter, who is now a VP at Merck, helped rev up a massive data-gathering effort called the Placebo Response Drug Trials Survey.
Under the auspices of the NIH, Potter and his colleagues are acquiring decades of trial data—including blood and DNA samples—to determine which variables are responsible for the apparent rise in the placebo effect. Merck, Lilly, Pfizer, AstraZeneca, GlaxoSmithKline, Sanofi-Aventis, Johnson & Johnson, and other major firms are funding the study, and the process of scrubbing volunteers’ names and other personal information from the database is about to begin.
In typically secretive industry fashion, the existence of the project itself is being kept under wraps. NIH staffers are willing to talk about it only anonymously, concerned about offending the companies paying for it.
For Potter, who used to ride along with his father on house calls in Indiana, the significance of the survey goes beyond Big Pharma’s finally admitting it has a placebo problem. It also marks the twilight of an era when the drug industry was confident that its products were strong enough to cure illness by themselves.
“Before I routinely prescribed antidepressants, I would do more psychotherapy for mildly depressed patients,” says the veteran of hundreds of drug trials. “Today we would say I was trying to engage components of the placebo response—and those patients got better. To really do the best for your patients, you want the best placebo response plus the best drug response.”
The pharma crisis has also finally brought together the two parallel streams of placebo research—academic and industrial. Pfizer has asked Fabrizio Benedetti to help the company figure out why two of its pain drugs keep failing. Ted Kaptchuk is developing ways to distinguish drug response more clearly from placebo response for another pharma house that he declines to name. Both are exploring innovative trial models that treat the placebo effect as more than just statistical noise competing with the active drug.
Benedetti has helped design a protocol for minimizing volunteers’ expectations that he calls “open/hidden.” In standard trials, the act of taking a pill or receiving an injection activates the placebo response. In open/hidden trials, drugs and placebos are given to some test subjects in the usual way and to others at random intervals through an IV line controlled by a concealed computer. Drugs that work only when the patient knows they’re being administered are placebos themselves.
Ironically, Big Pharma’s attempt to dominate the central nervous system has ended up revealing how powerful the brain really is. The placebo response doesn’t care if the catalyst for healing is a triumph of pharmacology, a compassionate therapist, or a syringe of salt water. All it requires is a reasonable expectation of getting better. That’s potent medicine.
Contributing editor Steve Silberman (steve@stevesilberman.com) wrote about the hunt for Jim Gray in issue 15.08.
1 comment September 9th, 2009
As a health researcher, I am a strong advocate of increasing the research-base guiding our clinical efforts. Among other things, I help develop systems to assess the outcomes of psychosocial interventions in order to use the resultant knowledge to improve the quality of treatments that are delivered to our clients. Yet, as a clinician, I am a skeptic regarding the quality of our current knowledge and its ability to appropriately guide our practice. Do we really know enough? And what about the large element of clinical expertise that cannot, with our current tools anyway, be quantified.
This is a tension I have lived with and explored throughout my professional career. I even co-edited a book — Reconciling Empirical Knowledge and Clinical Experience: The Art and Science of Psychotherapy — on the interface between research and clinical practice.
I have just been sent this new Wall Street Journal op-ed by Jerome Groopman and Pamela Hartbrand that makes the case that rigid guidelines can be wrong, and even dangerous. Groopman and Hartbrand argue that these guidelines, based as they are on what is believed to be best practices, can, in the current state of our knowledge, easily turn out to be suboptimal or even harmful. As one of the examples they give illustrates:
One key quality measure in the ICU became the level of blood sugar in critically ill patients. Expert panels reviewed data on whether ICU patients should have insulin therapy adjusted to tightly control their blood sugar, keeping it within the normal range, or whether a more flexible approach, allowing some elevation of sugar, was permissible. Expert consensus endorsed tight control, and this approach was embedded in guidelines from the American Diabetes Association. The Joint Commission on Accreditation of Healthcare Organizations, which generates report cards on hospitals, and governmental and private insurers that pay for care, adopted as a suggested quality metric this tight control of blood sugar.
A colleague who works in an ICU in a medical center in our state told us how his care of the critically ill is closely monitored. If his patients have blood sugars that rise above the metric, he must attend what he calls “re-education sessions” where he is pointedly lectured on the need to adhere to the rule. If he does not strictly comply, his hospital will be downgraded on its quality rating and risks financial loss. His status on the faculty is also at risk should he be seen as delivering low-quality care.
But this coercive approach was turned on its head last month when the New England Journal of Medicine published a randomized study, by the Australian and New Zealand Intensive Care Society Clinical Trials Group and the Canadian Critical Care Trials Group, of more than 6,000 critically ill patients in the ICU. Half of the patients received insulin to tightly maintain their sugar in the normal range, and the other half were on a more flexible protocol, allowing higher sugar levels. More patients died in the tightly regulated group than those cared for with the flexible protocol.
This example illustrates both the difficulty with rigid guidelines and the need to be able to use judgment and flexibility in treating patients.
I concur with their concerns. We are far from knowing with any degree of certainty the correctness of most of our clinical guidlines. Yet I also believe that clinical care independent of research is increasingly problematic. While Groopman and Hartbrand are right about the need for clinical flexibility, the ignore the opposite problem whereby the treatment a patient receives depends in an arbitrary manner of which doctor or hospital they go to, or where they live. Thus, enormous geographic variability has been found for certain surgical procedures with no evidence that the variability is based upon anything but custom.
Thus, I believe that health care systems need to measure their outcomes and use the resultant data to improve care. Yet, they also need to avoid the rigid guideline problem.
One way of reconciling these conflicting impulses is to implement outcomes monitoring in a quality improvement framework.That is, the goal is to identify practices and health providers who have superior outcomes and find how to tap their knowledge and expertise and communicate it to those whose outcomes are inferior. A successful quality improvement framework is based upon the assumption that the vast majority of healthcare workers want to deliver quality care. Thus, they will be open data-driven quality improvement efforts, as long as these are conducted in a collaboarative and respectful manner, fully valuing the expertise of healthcare workers while providing them with the information and tools to improve their efforts.
One important aspect of such a quality improvement perspective is that healthcare workers, doctors and others, should be part of team that selects outcome measures and quality improvement implementation procedures. Especially in “fuizzy” areas like mental health, the ability to control the outcomes that are measured is a powerful influence on the nature of treatment that is delivered.
To take one example with which I am intimately familiar, if substance abuse treatment outcomes included measures of such lifetsyle factors as having housing and jobs, as well as improved mental health, then these life domains are likely to be included in treatment planning. However, if substance abuse outcomes only include measurements of substance use, then large aspects of substance abusing clients’ lives will ultimately be given short shrift when planning and conducting treatment.
By the way, similar issues arise in a number of other areas than healthcare. Thus, much of the current efforts to measure “outcomes” in education could similarly benefit from a quality improvement perspective. If teachers and parents, not to mention students, were more integrated into the vast apparatus now assessing educational outcomes through standardized testing, there would likely be less grousing among teachers, with its acompanying drop in morale and loss of experienced teachers to early retirement.
Here is the complete Groopman and Hartbrand article:
Why ‘Quality’ Care Is Dangerous
The growing number of rigid protocols meant to guide doctors have perverse consequencesBy Jerome Groopman and Pamela Hartbrand
The Obama administration is working with Congress to mandate that all Medicare payments be tied to “quality metrics.” But an analysis of this drive for better health care reveals a fundamental flaw in how quality is defined and metrics applied. In too many cases, the quality measures have been hastily adopted, only to be proven wrong and even potentially dangerous to patients.
Health-policy planners define quality as clinical practice that conforms to consensus guidelines written by experts. The guidelines present specific metrics for physicians to meet, thus “quality metrics.” Since 2003, the federal government has piloted Medicare projects at more than 260 hospitals to reward physicians and institutions that meet quality metrics. The program is called “pay-for-performance.” Many private insurers are following suit with similar incentive programs.
In Massachusetts, there are not only carrots but also sticks; physicians who fail to comply with quality guidelines from certain state-based insurers are publicly discredited and their patients required to pay up to three times as much out of pocket to see them. Unfortunately, many states are considering the Massachusetts model for their local insurance.
How did we get here? Initially, the quality improvement initiatives focused on patient safety and public-health measures. The hospital was seen as a large factory where systems needed to be standardized to prevent avoidable errors. A shocking degree of sloppiness existed with respect to hand washing, for example, and this largely has been remedied with implementation of standardized protocols. Similarly, the risk of infection when inserting an intravenous catheter has fallen sharply since doctors and nurses now abide by guidelines. Buoyed by these successes, governmental and private insurance regulators now have overreached. They’ve turned clinical guidelines for complex diseases into iron-clad rules, to deleterious effect.
One key quality measure in the ICU became the level of blood sugar in critically ill patients. Expert panels reviewed data on whether ICU patients should have insulin therapy adjusted to tightly control their blood sugar, keeping it within the normal range, or whether a more flexible approach, allowing some elevation of sugar, was permissible. Expert consensus endorsed tight control, and this approach was embedded in guidelines from the American Diabetes Association. The Joint Commission on Accreditation of Healthcare Organizations, which generates report cards on hospitals, and governmental and private insurers that pay for care, adopted as a suggested quality metric this tight control of blood sugar.
A colleague who works in an ICU in a medical center in our state told us how his care of the critically ill is closely monitored. If his patients have blood sugars that rise above the metric, he must attend what he calls “re-education sessions” where he is pointedly lectured on the need to adhere to the rule. If he does not strictly comply, his hospital will be downgraded on its quality rating and risks financial loss. His status on the faculty is also at risk should he be seen as delivering low-quality care.
But this coercive approach was turned on its head last month when the New England Journal of Medicine published a randomized study, by the Australian and New Zealand Intensive Care Society Clinical Trials Group and the Canadian Critical Care Trials Group, of more than 6,000 critically ill patients in the ICU. Half of the patients received insulin to tightly maintain their sugar in the normal range, and the other half were on a more flexible protocol, allowing higher sugar levels. More patients died in the tightly regulated group than those cared for with the flexible protocol.
Similarly, maintaining normal blood sugar in ambulatory diabetics with vascular problems has been a key quality metric in assessing physician performance. Yet largely due to two extensive studies published in the June 2008 issue of the New England Journal of Medicine, this is now in serious doubt. Indeed, in one study of more than 10,000 ambulatory diabetics with cardiovascular diseases conducted by a group of Canadian and American researchers (the “ACCORD” study) so many diabetics died in the group where sugar was tightly regulated that the researchers discontinued the trial 17 months before its scheduled end.
And just last month, another clinical trial contradicted the expert consensus guidelines that patients with kidney failure on dialysis should be given statin drugs to prevent heart attack and stroke.
These and other recent examples show why rigid and punitive rules to broadly standardize care for all patients often break down. Human beings are not uniform in their biology. A disease with many effects on multiple organs, like diabetes, acts differently in different people. Medicine is an imperfect science, and its study is also imperfect. Information evolves and changes. Rather than rigidity, flexibility is appropriate in applying evidence from clinical trials. To that end, a good doctor exercises sound clinical judgment by consulting expert guidelines and assessing ongoing research, but then decides what is quality care for the individual patient. And what is best sometimes deviates from the norms.
Yet too often quality metrics coerce doctors into rigid and ill-advised procedures. Orwell could have written about how the word “quality” became zealously defined by regulators, and then redefined with each change in consensus guidelines. And Kafka could detail the recent experience of a pediatrician featured in Vital Signs, the member publication of the Massachusetts Medical Society. Out of the blue, according to the article, Dr. Ann T. Nutt received a letter in February from the Massachusetts Group Insurance Commission on Clinical Performance Improvement informing her that she was no longer ranked as Tier 1 but had fallen to Tier 3. (Massachusetts and some private insurers use a three-tier ranking system to incentivize high-quality care.) She contacted the regulators and insisted that she be given details to explain her fall in rating.
After much effort, she discovered that in 127 opportunities to comply with quality metrics, she had met the standards 115 times. But the regulators refused to provide the names of patients who allegedly had received low quality care, so she had no way to assess their judgment for herself. The pediatrician fought back and ultimately learned which guidelines she had failed to follow. Despite her cogent rebuttal, the regulator denied the appeal and the doctor is still ranked as Tier 3. She continues to battle the state.
Doubts about the relevance of quality metrics to clinical reality are even emerging from the federal pilot programs launched in 2003. An analysis of Medicare pay-for-performance for hip and knee replacement by orthopedic surgeons at 260 hospitals in 38 states published in the most recent March/April issue of Health Affairs showed that conforming to or deviating from expert quality metrics had no relationship to the actual complications or clinical outcomes of the patients. Similarly, a study led by UCLA researchers of over 5,000 patients at 91 hospitals published in 2007 in the Journal of the American Medical Association found that the application of most federal quality process measures did not change mortality from heart failure.
State pay-for-performance programs also provide disturbing data on the unintended consequences of coercive regulation. Another report in the most recent Health Affairs evaluating some 35,000 physicians caring for 6.2 million patients in California revealed that doctors dropped noncompliant patients, or refused to treat people with complicated illnesses involving many organs, since their outcomes would make their statistics look bad. And research by the Brigham and Women’s Hospital published last month in the Journal of the American College of Cardiology indicates that report cards may be pushing Massachusetts cardiologists to deny lifesaving procedures on very sick heart patients out of fear of receiving a low grade if the outcome is poor.
Dr. David Sackett, a pioneer of “evidence-based medicine,” where results from clinical trials rather than anecdotes are used to guide physician practice, famously said, “Half of what you’ll learn in medical school will be shown to be either dead wrong or out of date within five years of your graduation; the trouble is that nobody can tell you which half — so the most important thing to learn is how to learn on your own.” Science depends upon such a sentiment, and honors the doubter and iconoclast who overturns false paradigms.
Before a surgeon begins an operation, he must stop and call a “time-out” to verify that he has all the correct information and instruments to safely proceed. We need a national time-out in the rush to mandate what policy makers term quality care to prevent doing more harm than good.
**********
Dr. Groopman, a staff writer for the New Yorker, and Dr. Hartzband are on the staff of Beth Israel Deaconess Medical Center in Boston and on the faculty of Harvard Medical School.
April 8th, 2009
Several readers of my post earlier today New Doubts Regarding the Lancet Iraq Mortality Study have raised the question as to why the lapse committed by Burnham et al. in this study warrants dismissing the entire study. After all, they argue, the lapse of recording names was an ethical lapse, perhaps, but recording extra information should not affect the results. Let me take this opportunity to clarify my reasoning.
The faith one has in the results of any study depends largely on the quality of the research design and on how carefully that design is followed. In the case of a population-based epidemiological survey like the 2006 Lancet study (Lancet II), even minor deviations from the survey design can have large effects on the results. (Survey research depends crucially on every person in the population having an equal chance of being selected.) As one example, if interviewers used discretion – beyond that mandated by safety considerations – in selecting households, it could introduce (probably unintentional and unconscious) bias that would make the findings unreliable. For this reason, survey researchers attempt to maintain strict control over the procedures actually used by those collecting data in the field.
We have been assured for years that the design of Lancet II was carefully followed. Now we hear that the specified design was not followed in a crucial way that may have put participants at risk. Furthermore, the Lancet researchers have for years pointed to those very risks as reasons to deny access to raw data and to withhold crucial methodological information when questioned. The fact that the protocol wasn’t followed in a central aspect severely reduces the confidence we can have that the study procedures were carefully monitored.
The Baltimore Sun reported:
“Because of the difficulty of carrying out research in Iraq during the war, Burnham and his team partnered with Iraqi doctors at a university in Iraq. Burnham, working out of Jordan, said he made it clear to the doctors that they could collect the first names of children and adults, to help keep the information straight, but that last names could not be collected.
“When the surveys came back to him in Jordan, it appeared that some had last names. Many were in Arabic. Burnham said he asked his Iraqi partners and was told that the names were not complete, which he accepted. But Hopkins, in its investigation, found that the data form used in the surveys was different from what was originally proposed, and included space for names of respondents. Hopkins found that full names were collected.”
This description, if true, supports the assumption that Burham was in no position to carefully monitor the details of data collection for the study. Further, at its most charitable, it indicates severe communication difficulties with the Iraqi staff that may easily have left him unaware of other possible deviations in procedures. If one is not so charitable, one may wonder why Burham was told a falsehood, that the names were only first names, and thus what else was distorted. In any case, in the absence of this confidence in the study procedures, we cannot maintain confidence in the study’s results.
There is yet another troubling aspect of this incident. The lapse that occurred, recording of full names of respondents reporting deaths from violence in a country undergoing civil war after the Johns Hopkins ethics committee and the respondents were told no names or unique identifiers would be collected, is no trifling error. As Johns Hopkins Magazine reported in its February 2007 issue:
“Concern for the safety of interviewers and respondents alike produced two more decisions. First, they would not record identifiers like the names and addresses of people interviewed. Burnham feared retribution if a hostile militia at a checkpoint found a record of households visited by the Iraqi survey teams.”
Thus, the researchers were well aware that collecting names of respondents could put them at grave risk. Burnham owed it to the people in his study to have enquired further when he noticed names on the forms and not so easily accepted false reassurances. That he did not suggests that he may have (perhaps unconsciously) looked the other way at other possible deviations from protocol.
Since the study was released over two years ago, it has been subjected to severe criticism. While much of this criticism was likely motivated by concern for the political implications of the study, and some of the criticism was clearly unwarranted, that does not give the study a free pass on criticism. And we shouldn’t look the other way to its potential problems just because its findings support our antiwar position.
In response to the criticism, the Lancet study authors have been less than forthcoming with key details, such as their exact sampling procedure for selecting streets, which, under criticism, they admitted was not accurately described in the published paper. That we now know that another crucial detail, the collection of identifiable information, deviated from the published record, and that the authors failed to correct the public record on the matter until forced to, raises questions about what other aspects of the study may not have been conducted as described. As long as these questions remain, the study cannot be considered reliable.
March 16th, 2009
SEE UPDATE BELOW:
Since the Iraq war began, an important question for those closely following the conflict has been the number of excess Iraqi casualties resulting from the war and occupation. Various researchers have attempted to estimate this number. Iraq Body Count has kept a running tab of civilian deaths reported in the Western media and, more recently, by certain Iraqi government sources., but their figure, now at around 95,000, is undoubtedly low due to its reliance on media reports and Iraqi government figures. During times of intense conflict, many deaths likely go unreported in the media, while there have been numerous inconsistencies in and reports of political manipulation of government figures as it may not be in the government’s interest to admit the extent of deaths from the conflict.
An alternate way to estimate conflict-associated mortality is through the conduct of carefully sampled household surveys counting the number of deaths in selected households and using statistical techniques to extrapolate to the overall population. Much attention has been focused especially, by myself and others, on the Lancet mortality studies of 2004 and 2006.The first of these studies estimated that there had been approximately 100,000 excess deaths from the war by September 2004. The second study estimated that there were around 650,000 excess deaths through summer 2006. They further found that the vast majority of these excess casualties — around 600,000 — were from violence, a stark contrast from most other such conflicts studied where large numbers die from poor health and the breakdown of social organization associated with conflict. “Excess casualties” here means the number who died above that number that would have been expected to die had prewar trends continued and the war and occupation not occurred.
We have recently learned that Gilbert Burham, the lead author of second Lancet study, has been sanctioned by Johns Hopkins for deviating from the approved IRB protocol and collecting the names of many survey respondents, a fact that was implicitly denied in numerous public pronouncements. The school does assert that, as far as they can determine, no one was harmed by this ethical lapse. As a result of this sanction, Burnham has been barred by Johns Hopkins from serving as the principal investigator (lead researcher) on studies involving “human subjects” (live people) for five years. He was also ordered to publish a correction in the Lancet, which has now appeared:
“The Methods section of this Article (Oct 21, 2006) stated that ‘Participants were assured that no unique identifiers would be gathered.’ Upon review, it was determined that a significant number of the surveys contained names of respondents and household inhabitants. This was a lapse in the authors’ obligations to protect participants. However, to the authors’ knowledge, the completed surveys remained in possession of the research team at all times and there were no known breaches in confidentiality.”
This error, and its possible coverup in subsequent public statements means that, in my opinion, we can no longer rely upon the Lancet II mortality estimates. If one major methodological detail was distorted, we simply cannot know whether other aspects of the study were carried out as stated. Until and unless there is far greater detail on these methods, I do not feel that their estimate of 650,000 post-invasion surplus deaths can be trusted.
Burnham had early last month been censured by the American Association for Public Opinion Research for refusing to reveal details of the study methodology. I must say I find this censure highly unusual at best as Burnham is not a member of AAPOR. I have never previously heard of a professional association investigating, much less censuring, a non-member. However, as the Hopkins investigation shows, the non-cooperation may have been to cover up the methodological discrepancy, rather than for more understandable reasons.
I find this episode deeply disturbing. The issue of the magnitude of civilian deaths in Iraq is a profoundly important one. Given the known political sensitivity of the issue, the researchers should have been especially careful in the controllable aspects of their methodology. They were not. Rather, they gave ammunition to those who would inevitably attack their conclusions for political or ideological reasons. The result is that we are less knowledgeable about this important question than many of us believed as an important data source is no longer reliable.
While I find David Kane’s self-satisfied tone to be disturbing, I must admit that he was more right than I had believed regarding the weaknesses in the Lancet II study. As Kane points out, Burnham’s public statements were, in spirit if not in legalistic wording, not accurate.
We are left with several other studies estimating Iraqi casualties. The British ORB polling company estimated as of August 2007
that over 1,000,000 Iraqi citizens have died as a result of the conflict which started in 2003
While ORB is a reputable polling company, the faith we can place in these results is weakened due to their failure to publish a detailed methodology; such information is typically included in papers published in peer-reviewed journals, which is one reason researchers typically place greater credence on studies published in such journals. When the Lancet II findings were credible, the ORB study appeared to be a replication of the general order of magnitude of casualties found in that study. With the increased doubts about the Lancet II study, the ORB stands as an outlier. I wish the firm would publish a detailed methodology that would allow better evaluation of their findings.
At the low end, a study conducted by the Iraq Ministry of Health and other Iraq government entities in collaboration with the World Health Organization, estimated 151,000 violent between January 2002 and June 2006. While the authors did not estimate the total number of excess deaths — nonviolent as well as violent — presumably because these estimates would be less precise, dependent as they would be on estimates of prewar mortality rates, those estimates would be considerably higher by several hundred thousand. Critiques of this study have questioned whether many Iraqi citizens might be reluctant to admit to Iraqi government-associated researchers that a family member was killed by violence. Thus, it is not implausible to assume that this study is an undercount and constitutes a lower bound. As the Ministry of Health study period ended while some of the most severe violence was still occurring, there have likely been many more violent deaths since then.
Thus, the best guess we can make at present is that at least 200,000 people died through violence since the US-led invasion, and that the true figure may be far higher. Moreover, an additional number that could be in the hundreds of thousands may have died from nonviolent causes — e.g., lack of clean water and healthcare — associated with the conflict, but this figure is uncertain. No matter what the correct figures turn out to be, it is clear that far too many have died as a result of this war of choice and subsequent occupation which may have deposed a dictator but which also disrupted an entire society.
UPDATE:
Postscript:
Several readers have raised the question as to why the lapse committed by Burnham et al. in this study warrants dismissing the entire study. After all, they argue, the lapse of recording names was an ethical lapse, perhaps, but recording extra information should not affect the results. Let me take this opportunity to clarify my reasoning.
The faith one has in the results of any study depends largely on the quality of the research design and on how carefully that design is followed. In the case of a population-based epidemiological survey like the 2006 Lancet study (Lancet II), even minor deviations from the survey design can have large effects on the results. (Survey research depends crucially on every person in the population having an equal chance of being selected.) As one example, if interviewers used discretion – beyond that mandated by safety considerations – in selecting households, it could introduce (probably unintentional and unconscious) bias that would make the findings unreliable. For this reason, survey researchers attempt to maintain strict control over the procedures actually used by those collecting data in the field.
We have been assured for years that the design of Lancet II was carefully followed. Now we hear that the specified design was not followed in a crucial way that may have put participants at risk. Furthermore, the Lancet researchers have for years pointed to those very risks as reasons to deny access to raw data and to withhold crucial methodological information when questioned. The fact that the protocol wasn’t followed in a central aspect severely reduces the confidence we can have that the study procedures were carefully monitored.
The Baltimore Sun reported:
“Because of the difficulty of carrying out research in Iraq during the war, Burnham and his team partnered with Iraqi doctors at a university in Iraq. Burnham, working out of Jordan, said he made it clear to the doctors that they could collect the first names of children and adults, to help keep the information straight, but that last names could not be collected.
“When the surveys came back to him in Jordan, it appeared that some had last names. Many were in Arabic. Burnham said he asked his Iraqi partners and was told that the names were not complete, which he accepted. But Hopkins, in its investigation, found that the data form used in the surveys was different from what was originally proposed, and included space for names of respondents. Hopkins found that full names were collected.”
This description, if true, supports the assumption that Burham was in no position to carefully monitor the details of data collection for the study. Further, at its most charitable, it indicates severe communication difficulties with the Iraqi staff that may easily have left him unaware of other possible deviations in procedures. If one is not so charitable, one may wonder why Burham was told a falsehood, that the names were only first names, and thus what else was distorted. In any case, in the absence of this confidence in the study procedures, we cannot maintain confidence in the study’s results.
There is yet another troubling aspect of this incident. The lapse that occurred, recording of full names of respondents reporting deaths from violence in a country undergoing civil war after the Johns Hopkins ethics committee and the respondents were told no names or unique identifiers would be collected, is no trifling error. As Johns Hopkins Magazine reported in its February 2007 issue:
“Concern for the safety of interviewers and respondents alike produced two more decisions. First, they would not record identifiers like the names and addresses of people interviewed. Burnham feared retribution if a hostile militia at a checkpoint found a record of households visited by the Iraqi survey teams.”
Thus, the researchers were well aware that collecting names of respondents could put them at grave risk. Burnham owed it to the people in his study to have enquired further when he noticed names on the forms and not so easily accepted false reassurances. That he did not suggests that he may have (perhaps unconsciously) looked the other way at other possible deviations from protocol.
Since the study was released over two years ago, it has been subjected to severe criticism. While much of this criticism was likely motivated by concern for the political implications of the study, and some of the criticism was clearly unwarranted, that does not give the study a free pass on criticism. And we shouldn’t look the other way to its potential problems just because its findings support our antiwar position.
In response to the criticism, the Lancet study authors have been less than forthcoming with key details, such as their exact sampling procedure for selecting streets, which, under criticism, they admitted was not accurately described in the published paper. That we now know that another crucial detail, the collection of identifiable information, deviated from the published record, and that the authors failed to correct the public record on the matter until forced to, raises questions about what other aspects of the study may not have been conducted as described. As long as these questions remain, the study cannot be considered reliable.
2 comments March 15th, 2009
To many academics and researchers the American Psychological Association is largely known as the publisher of many high quality, and high status, journals. the National Institutes of Health recently required that all research funded y the NIH deposit publications in an open source depository. APA evidently tried to make money off this process, as the Chronicle of Higher Education reported yesterday:
July 15, 2008
Psychological Association Will Charge Authors for Open-Access ArchivingBy Lila Guterman
In what appears to be a new policy, the American Psychological Association will require authors who publish in its journals to let it deposit their papers in open-access repositories — and it will charge them $2,500 to do so.
Researchers who have grants from the National Institutes of Health must deposit their published articles in the institutes’ online archive, PubMed Central. Last week the journal Nature and many of its offshoots announced that they would deposit their authors’ articles for them. Free.
Now the psychological association says that its authors “should NOT deposit” their own manuscripts, and instead should allow the group to do so. “The deposit fee of $2,500 per manuscript for 2008 will be billed to the author’s university,” the policy says.
Because the NIH does not charge a fee, that money is apparently going to the psychological association.
Open-access advocates like Peter Suber, a research professor of philosophy at Earlham College, expressed outrage. “It’s as bad as it looks,” he told The Chronicle. “This is not a good use of anybody’s money.” Depositing an article in PubMed Central, he said, is a “clerical job that can be done by a machine.”
The psychological association did not immediately respond to a request for comment from The Chronicle.
This report, or other negative reaction appears to hve led the APA to back down. When one follows the link to their web site, one now sees:
Document Deposit Policy and Procedures for APA Journals
A new document deposit policy of the American Psychological Association (APA) requiring a publication fee to deposit manuscripts in PubMed Central based on research funded by the National Institutes of Health (NIH) is currently being re-examined and will not be implemented at this time. This policy had recently been announced on APA’s Web site. APA will soon be releasing more detailed information about the complex issues involved in the implementation of the new NIH Public Access Policy.
APA will continue to deposit NIH-funded manuscripts on behalf of authors in compliance with the NIH Public Access Policy.
To continue with these charges could lead many more academics and researchers to leave the APA, as any such fees would be taken, directly or indirectly, by their universities out of the grants, reducing already limited research funds. And non-APA journals would instantly become more attractive publishing venues.
1 comment July 16th, 2008
Most researchers recognize that research is inherently affected by social beliefs, norms, and practices. But the Census Bureau is about to give us an enormous demonstration. They have decided to remove all gay marriages from their marriage data for the 2010 census, despite laws in Massachusetts and California legalizing those marriges. Perhaps next they’ll remove “Islam” from the list of reportable religions.
U.S. Census Bureau won’t count same-sex marriages
By Mike Swift
Mercury NewsTens of thousands of same-sex couples are expected to marry legally in California by 2010, if a constitutional ban on gay marriage doesn’t pass at the polls in November.
But no matter what the voters decide, the official government count of the number of married same-sex couples in California is not in doubt. It will be zero.
The U.S. Census Bureau, reacting to the federal Defense of Marriage Act and other mandates, plans to edit the 2010 census responses of same-sex couples who marry legally in California, Massachusetts or any other state. They will be reported as “unmarried partners,” rather than married spouses, in census tabulations – a policy that will likely draw the ire of gay rights groups.
The Census Bureau followed the same procedure for the 2000 census, and it does not plan to change in 2010 even though courts in Massachusetts and now California have ruled gay men and lesbians can marry lawfully.
“This has been a question we’ve been looking at for quite a long time,” said Martin O’Connell, chief of the Census Bureau’s Fertility and Family Statistics Branch. “It’s not something the bureau could arbitrarily or casually decide to change on a whim, because our data is used by virtually every federal agency.”
The Census Bureau is not falsifying people’s responses, O’Connell said, because the bureau will retain people’s original census responses.
“We’re not destroying data; we are keeping that data,” O’Connell said. “We are just showing the data published in a way that is consistent with the way every other agency publishes their data.”The Census Bureau does not ask about sexual orientation, but it does ask people to describe their relationships to others in their household. If a respondent refers to a person of the same gender as their “husband/wife” on the 2010 census form, the Census Bureau will automatically assign them to the “unmarried partner” category. Legally married same-sex couples will be indistinguishable in census data from those who chose “unmarried partner” to describe their relationship.
Researcher’s view
Critics say the census plan will mask the records of legal, same-sex, married couples and therefore degrade the quality of the government’s demographic data.
“I just think it’s bad form for the census to change a legal response to an incorrect response,” said Gary Gates of the Williams Institute, a think tank at the University of California-Los Angeles law school that studies gay-related public policy issues. “That goes against everything the census stands for.”
Gates, a prominent demographer who was consulted by Census Bureau officials about counting legally married same-sex couples, said one result is that the census will undercount marriages in states with gay marriage. And because the bureau defines a “family” as two or more people related by birth, adoption or marriage, it also will remove many same-sex married couples from being counted as families.
“It’s a systematic hiding not only of married gay couples, but gay couples as families, which I would argue is a fundamentally political decision,” Gates said.
One recently married couple called the policy “frustrating.”
“It’s just another layer of the hurdles we have to jump, as far as our relationship being recognized,” said Jim Winstead of Hollister, who recently married his partner, Rodney Naccarato-Winstead. The couple have an 18-month-old son.
Gay rights groups, learning of the policy this week, were also critical.
“To have the federal government disappear your marriage I’m sure will be painful and upsetting,” said Shannon Minter, legal director for the National Center for Lesbian Rights. “It really is something out of Orwell. It’s shameful.”
A spokeswoman for ProtectMarriage.com, campaigning in favor of the constitutional ban, declined to discuss the census issue in detail, but said it illuminates how the legalization of gay marriage potentially could dictate policy changes on government.
“One of our campaign cornerstones will be the fact that if the initiative doesn’t pass that public schools will be forced to teach the difference between gay marriage and traditional marriage,” said Jennifer Kerns.
Bureau’s reasoning
A census technical note that explains the bureau’s rationale on counting same-sex partners for the 2000 census notes that the 1996 Defense of Marriage Act “instructs all federal agencies only to recognize opposite-sex marriages for the purposes of enacting any agency programs.”
O’Connell said the Census Bureau has been unable to find any federal agency that collects data on same-sex married couples. Changing the policy before the 2010 census also would be a huge and difficult logistical issue.
“The last thing anyone wants is to use the 2010 census as a trial run,” O’Connell said.
Gates said, however, that the limitations on access to people’s original responses will make it very difficult for private researchers to analyze raw data and back out the number of same-sex spouses in California or other states.
“It’s an official closet,” Gates said, “that the government has built.”
1 comment July 12th, 2008
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Feb | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | ||||