The emergence of comparative effectiveness has led to a renewed interest in the role of observational studies for assessing the benefits and harms of alternative interventions. Observational studies compare outcomes between patients who receive different interventions through some process other than investigator randomization. Most commonly, this process is the natural variation in clinical care, although observational studies also can take advantage of natural experiments, where higher-level changes in care delivery (eg, changes in state policy or changes in hospital unit structure) lead to changes in intervention exposure between groups. Observational studies can enroll patients by exposure (eg, type of intervention) using a cohort design or outcome using a case-control design. Cohort studies can be performed prospectively, where participants are recruited at the time of exposure, or retrospectively, where the exposure occurred before participants are identified.

The strengths and limitations of observational studies for clinical effectiveness research have been debated for decades.7,17 Because the incremental cost of including an additional participant is generally low, observational studies often have relatively large numbers of participants who are more representative of the general population. Large, diverse study populations make the results more generalizable to real-world practice and enable the examination of variation in effect across patient subgroups. This advantage is particularly important for understanding effectiveness among vulnerable populations, such as racial minorities, who are often underrepresented in RCT participants. Observational studies that take advantage of existing data sets are able to provide results quickly and efficiently, a critical need for most CER. Currently, observational data already play an important role in influencing guidelines in many areas of oncology, particularly around prevention (eg, nutritional guidelines, management of BRCA1/2 mutation carriers)18,19 and the use of diagnostic tests (eg, use of gene expression profiling in women with node-negative, estrogen receptor–positive breast cancer).20

However, observational studies also have important limitations. Observational studies are only feasible if the intervention of interest is already being used in clinical practice; they are not possible for evaluation of new drugs or devices. Observational studies are subject to bias, including performance bias, detection bias, and selection bias.17,21 Performance bias occurs when the delivery of one type of intervention is associated with generally higher levels of performance by the health care unit (ie, health care quality) than the delivery of a different type of intervention, making it difficult to determine if better outcomes are the result of the intervention or the accompanying higher-quality health care. Detection bias occurs when the outcomes of interest are more easily detected in one group than another, generally because of differential contact with the health care system between groups. Selection bias is the most important concern in the validity of observational studies and occurs when intervention groups differ in characteristics that are associated with the outcome of interest.

These differences can occur because a characteristic is part of the decision about which treatment to recommend (ie, disease severity), which is often termed confounding by indication, or because it is correlated with both intervention and outcome for another reason. A particular concern for CER of therapies is that some new agents may be more likely to be used in patients for whom established therapies have failed and who are less likely to be responsive to any therapy.

There are two main approaches for addressing bias in observational studies. First, important potential confounders must be identified and included in the data collection. Measured confounders can be addressed through multivariate and propensity score analysis. A telling example of the importance of adequate assessment of potential confounders was found through examination of the observational studies of hormone replacement therapy (HRT) and coronary heart disease (CHD). Meta-analyses of observational studies had long estimated a substantial reduction in CHD risk with the use of postmenopausal HRT. However, the WHI (Women’s Health Initiative) trial, a large, double-blind RCT of postmenopausal HRT, found no difference in CHD risk between women assigned to HRT or placebo. Although this apparent contradiction is often used as general evidence against the validity of observational studies, a re-examination of the observational studies demonstrated that studies that adjusted for measures of socioeconomic status (a clear confounder between HRT use and better health outcomes) had results similar to those of the WHI, whereas studies that did not adjust for socioeconomic status found a protective effect with HRT22 (Fig 2).

The use of administrative data sets for observational studies of comparative effectiveness is likely to become increasingly common as health information technology spreads, and data become more accessible; however, these data sets may be particularly limiting in their ability to include data on potential confounders. In some cases, the characteristics that influence the treatment decision may not be available in the data (eg, performance status, tumor gene expression), making concerns about confounding by indication too high to proceed without adjusting data collection or considering a different question.

Compare outcomes between patients who receive different interventions