Before making the first incision, confirm the patient’s identity. Mark the surgical site. Ask about allergies. Discuss any anticipated blood loss. Introduce yourself by name.

These are some of the 19 tasks on the World Health Organization (WHO) Surgical Safety Checklist, a simple list of actions to be completed before an operation in order to cut errors and save lives.

In 2007 and 2008, surgical staff at eight hospitals around the world tested the checklist in a pilot study1. The results were remarkable. Complications such as infections after surgery fell by more than one-third, and death rates dropped by almost half. The WHO recommended that all hospitals adopt its checklist or something similar, and many did. The UK National Health Service (NHS) immediately required all of its treatment centres to put the checklist into daily practice; by 2012, nearly 2,000 institutions worldwide had tried it.

The idea of checklists as a simple and cheap way to save lives has taken hold throughout the clinical community. It has some dynamic champions, including Atul Gawande, a surgeon at Brigham and Women’s Hospital in Boston, Massachusetts, who led the pilot study and has spread the word through talks, magazine articles and a best-selling book, The Checklist Manifesto (Metropolitan, 2009).

But this success story is beginning to look more complicated: some hospitals have been unable to replicate the impressive results of initial trials. An analysis of more than 200,000 procedures at 101 hospitals in Ontario, Canada, for example, found no significant reductions in complications or deaths after surgical-safety checklists were introduced2. “We see this all the time,” says David Urbach, a surgeon at the University of Toronto who led the Ontario analysis. “A lot of studies that should be a slam dunk don’t seem to work in practice.” The stakes are high, because poor use of checklists means that people may be dying unnecessarily.

A cadre of researchers is working to make sense of the discrepancies. They are finding a variety of factors that can influence a checklist’s success or failure, ranging from the attitudes of staff to the ways that administrators introduce the tool. The research is part of the growing field of implementation science, which examines why some innovations that work wonderfully in experimental trials tend to fall flat in the real world. The results could help to improve the introduction of other evidence-based programmes, in medicine and beyond.

“We need to learn the lessons from programmes and interventions like the checklist so we don’t make the same mistakes again,” says Nick Sevdalis, an implementation scientist at King’s College London.

Replication frustration

One of the first to demonstrate the potential of checklists in health care was Peter Pronovost, an anaesthesiologist and critical-care physician at Johns Hopkins University School of Medicine in Baltimore, Maryland. In 2001, Pronovost introduced a short checklist for health-care workers who insert central venous catheters, or central lines, which are often used in an intensive care unit (ICU) to test blood or administer drugs. The trial showed that asking practitioners to confirm that they had performed certain simple actions, such as washing their hands and sterilizing the insertion site, contributed to a dramatic reduction in the risk of life-threatening infections3. The list got a larger test in a now-famous trial4 known as the Keystone ICU project, launched in Michigan in October 2003. Within 18 months, the rate of catheter-related bloodstream infections fell by 66%.

Checklists were not completely new to medicine, but Pronovost’s work attracted attention because it suggested that they could save lives. Gawande penned an inspiring feature in The New Yorker5, asking: “If something so simple can transform intensive care, what else can it do?” Checklists began to proliferate. Now there are checklists for procedures involving anaesthesia, mechanical ventilation, childbirth and swine flu. Many studies have generated promising results, showing that the lists improve patient outcomes in hospitals from Norway to Iran.

But there have also been some failures. This January, less than a year after the report from Ontario, a different team of scientists reported6 that a surgical checklist modelled on Pronovost’s list did not improve outcomes at Michigan hospitals. And although the central-line checklist for ICUs has provided lasting benefits in Michigan, a British initiative called Matching Michigan, which aimed to replicate the Keystone programme, seemed to make no difference to infection rates7.

Some experts suspect that the failure to replicate could be a matter of how the initial trials or the follow-up studies were designed. Gawande’s pilot study of the WHO surgical checklist, for example, was not randomized and had no control group. Instead, it compared complication and death rates before and after the checklist was introduced. Critics say that this makes it difficult to determine what other factors might have influenced outcomes.

Gawande acknowledges the limitation, which was due to cost restrictions, but he points out that many subsequent trials, including ones that were randomized, have also demonstrated large reductions in complications and mortality following the introduction of the checklist. The list works, he says — as long as it is implemented well. “It turns out to be much more complex that just having the checklist in hand.”