When the search for genes that predispose to cardiovascular diseases (CVD) started >20 years ago, it was anticipated that genetic polymorphisms might be analogous to the already known CVD risk factors and could be incorporated in a risk model such as the Framingham score1 to assess the risk of an individual and adopt preventive or therapeutic measures accordingly. However, despite years of intensive research, not a single genetic risk factor is used for risk assessment. The new strategy of genome-wide association (GWA) studies (for example, see http://www.wtccc. org.uk/) coupled with the availability of very large cohorts of patients2 is starting to reveal novel genetic factors that contribute to disease risk. Whether these variants will be clinically more useful than those that were derived from the study of candidate genes still needs to be demonstrated. As time passes, the interest for genetic research on common CVD moves progressively from the direct expectation of risk stratification to the more fundamental understanding of disease origins and pathophysiology and their indirect diagnostic and therapeutic implications.
The objective of the present review is not to provide an exhaustive account of the numerous studies conducted on the genetics of CVD (eg, Arnett et al3), but to introduce a few basic notions required to understand the language of genetics and genomics (see Appendix) and illustrate with a limited number of examples the important insights provided by genetic research into the causes and mechanisms of CVD. We will also discuss the new GWA strategy and why this approach is likely to have a considerable impact on biomedicine and human disease understanding. Finally, we will try to explain the unsuccessful search for genetic markers of risk and why phenotypic biomarkers are likely to be clinically more useful.
The Basis of Genetic Variation
During the past decade, considerable progress has been achieved in the knowledge of the human genome and the characterization of its natural variability.4,5 The 20 000 to 25 000 protein coding genes that the human genome comprises represent only 30% of its sequence, the remainder being intergenic sequences that may contain important elements for the regulation of gene expression. In a typical human gene, 5% of the sequence is composed of coding exons that are in part translated into a protein, the remainder being covered by introns and regulatory regions located upstream (5′) and downstream (3′) of the coding sequence.
The most common type of human sequence variation consists of differences in individual base pairs termed single nucleotide polymorphisms (SNPs). Other sequence variations comprise variable numbers of short or long repetitions of the same motif in tandem such as mini- and microsatellites,6 insertions or deletions of various lengths, and structural variants that affect large chromosomal regions.7 The vast majority of these sequence variations are located in nonfunctional regions of the genome and have no phenotypic impact; these are said to be neutral and are usually termed markers. However, when variations occur within coding sequences or regulatory regions, they may affect the protein sequence or the level of gene expression and translate into observable phenotypic effects.
Mendelian Versus Complex Inheritance
The spectrum of the genetic variants that predispose to CVD spans from rare, highly deleterious mutations responsible for Mendelian diseases to common polymorphisms with weak effects that, alone or in combination, modulate the risk of common diseases (the “common variant–weak effect–common disease” model). In this latter case, the term “complex disease” is often used to denote the fact that the pattern of familial aggregation differs from that of Mendelian inheritance of a single genetic defect.
From an epidemiological perspective, rare deleterious mutations (eg, those that cause familial hypercholesterolemia [FH]) confer an important risk of coronary heart disease (CHD) in mutation carriers, but their impact at the population level is low. Conversely, polymorphisms such as the apolipoprotein E (APOE) polymorphism, because they are frequent, may have a population impact that is far from negligible despite a weak effect at the individual level. This duality, which relates to the epidemiological notions of absolute, relative, and attributable risks, has important medical and public health implications but is less crucial when the interest lies in the identification of pathophysiological pathways.
Mutations Responsible for Mendelian Diseases
Mutations are usually identified by linkage analysis conducted in families with several affected members over different generations. Regions that potentially harbor a disease-causing gene are identified by testing of the cosegregation of the disease with genetic markers that tag specific regions of the genome. This strategy uses genetic markers (ie, panels of microsatellites or large sets of SNPs regularly spaced throughout the genome) and tests whether particular alleles are cotransmitted with the disease at a higher frequency than expected by chance. The success of linkage studies depends on the availability of phenotypically well-characterized families that include a sufficiently large number of informative affected individuals. When a disease-linked region of the genome has been successfully mapped by linkage analysis, finding the responsible gene and sequence variation is not trivial because the region may sometimes encompass tens or hundreds of genes. However, thanks to the improved annotation of the human genome sequence and the possible design of dense SNP arrays that target the regions of interest, the discovery of the responsible sequence mutation may be accelerated by linkage disequilibrium (LD) mapping.8Although exceptions exist (eg, within isolated populations derived from a small number of founders), mutations that are associated with Mendelian diseases are rare (much <1%) and their origin is recent. This explains why their presence may be restricted to some groups of individuals only (population isolates, families). In that case, they are said to be “private mutations”.
Polymorphisms Involved in Complex Diseases
At the other end of the frequency spectrum of genetic variants, common polymorphisms (minor allele frequency >1%) are the focus of most contemporary genetic studies that target complex diseases. Common SNPs are estimated to number >10 million in the human genome.9 Because polymorphisms have common alleles, numerous combinations of susceptibility alleles at several loci in a particular individual are possible, and some of them may affect the risk of CVD in a way that cannot be predicted from the separate effect of each variant. This is the major obstacle to the characterization of the genetics of complex traits and the rationale for the proposal to explore systems of genes rather than single genes.10,11An important feature of polymorphisms, compared with rare deleterious mutations, is that they have an ancient origin. This explains why they are usually found in most human populations albeit often with different allele frequencies.
Because complex diseases do not follow a clear pattern of Mendelian inheritance, the strategy used to identify their genes of predisposition is usually not based on family studies but on a radically different approach called “genetic association” analysis. This approach relies on the existence of LD among physically close polymorphic sites in the genome, which implies that even if a polymorphism causally involved in the disease process is not directly observed, its association may be captured by a measured proxy polymorphism in LD with it. This is the basis of association studies that test the statistical association between genetic markers (the term “marker” denotes that no a priori causal role is assumed) and the disease in the population. The principle of genetic association studies is described in Figure 1. Initially, association studies focused on markers of candidate genes. Thanks to various initiatives, in particular the “HapMap” Project,13 increasingly dense genome-wide panels of common SNPs are now available that provide a powerful resource of markers (or tag SNPs) (Figure 1) for association studies. Contemporary association studies often encompass sets of genes that encode components of biological systems, chromosome regions, or even the whole genome.
Figure 1. Principle of genetic association studies. The schema represents a genomic region that contains 12 SNPs. The 3 SNPs in black are genotyped directly (these are the tag SNPs). The 6 SNPs in gray are captured through linkage disequilibrium (LD) with the tag SNPs (as denoted by arrows). The 3 SNPs in white are neither genotyped nor captured by tag SNPs (uncaptured SNPs), and so disease association with any of these uncaptured SNPs would be missed. The gray star represents a SNP causally associated with disease. It has 2 alleles (S1 and S2) and is in LD with a tag SNP that has 2 alleles (M1 and M2). The LD is reflected by the fact that the 2 SNPs generate only 3 haplotypes instead of the 4 possible because the haplotype M1S2 is never observed. As a consequence of this LD, the association of the causal SNP with disease could be detected through an indirect association with the tag SNP.
Figure 1. Principle of genetic association studies. The schema represents a genomic region that contains 12 SNPs. The 3 SNPs in black are genotyped directly (these are the tag SNPs). The 6 SNPs in gray are captured through linkage disequilibrium (LD) with the tag SNPs (as denoted by arrows). The 3 SNPs in white are neither genotyped nor captured by tag SNPs (uncaptured SNPs), and so disease association with any of these uncaptured SNPs would be missed. The gray star represents a SNP causally associated with disease. It has 2 alleles (S1 and S2) and is in LD with a tag SNP that has 2 alleles (M1 and M2). The LD is reflected by the fact that the 2 SNPs generate only 3 haplotypes instead of the 4 possible because the haplotype M1S2 is never observed. As a consequence of this LD, the association of the causal SNP with disease could be detected through an indirect association with the tag SNP. Adapted from Kruglyak,12with permission from the publisher. Copyright © 2005, Nature Publishing Group.
The primary goal of the International HapMap Project13 (http://www.hapmap.org/) was to create a public resource of common SNPs to capture most of the common human genome sequence variability. A second objective was to characterize the LD structure of the genome on the basis of the analysis of these SNPs. Because of the strong LD displayed by most regions of the genome, the combination of alleles at neighboring SNPs, called haplotypes, generates much less diversity than would be expected if they were uncorrelated. Recent studies have shown that the human genome is organized into a succession of distinct haplotype blocks that are ancestrally conserved.14–17 By resequencing the genome of 270 individuals from populations with African, Asian, and European ancestry, the HapMap Project has identified a set of SNPs that tag most of the common haplotypes in the human genome.18,19 This resource is used to search for polymorphisms associated with susceptibility to common diseases. For this purpose, genotyping arrays built with tag SNPs that encompass the whole genome or specific regions of interest are used; Figure 1 explains the principle.
Variants of “Intermediate” to Low Frequency Associated With Non-Mendelian Traits
Between the rare mutations responsible for Mendelian diseases and identified by family studies and the common polymorphisms targeted in current association studies, genetic variants that have a low frequency (<1%) but a sizeable individual effect (eg, relative risk >3) probably exist in significant numbers. These variants are presently difficult to characterize because they do not generate evident familial patterns of disease that would make them identifiable by linkage studies, and they are missed in the current candidate gene or genome-wide sequencing strategies, which use a limited number of individuals for polymorphism screening. Rare functional variants are difficult to tag with common markers such as SNPs. Their systematic characterization is therefore out of the scope of studies that rely on LD such as GWA studies and will depend on the availability of new high-throughput sequencing technologies and large DNA banks of patients and controls. Rare variants associated with non-Mendelian traits may prove to be clinically important as they may confer a significant increase in risk and therefore constitute potential diagnostic and prognostic tools. Interest for these variants has recently grown after the discovery of a number of them in the PCSK9 and ABCA1 genes.
Reblogged this on Full of Life Community.