1.Introduction to Epidemiological Studies
- Definition of Epidemiology
Epidemiology is defined as “the study of the occurrence and distribution of health-related events, states, and processes in specified populations, including the study of the determinants influencing such processes, and the application of this knowledge to control relevant health problems”.It is apparent that the scope of Epidemiology is very wide and mainly includes the study of incidence and prevalence of health conditions and traits, the study of their determinants (risk and protective factors), and the design of potential strategies for disease prevention.
Many subfields of Epidemiology have been developed, including environmental epidemiology, genetic epidemiology, and nutritional epidemiology. An early definition of Genetic Epidemiology defined it as “the field that addresses the etiology, distribution, and control of disease in groups of related individuals and the inherited causes of diseases in population.
-Cross-Sectional Studies
The defining characteristic of cross-sectional studies is that both exposure and outcome are ascertained at the same time. The temporal sequence is often impossible to work out, because exposure and outcome are identified at one time point. However, cross sectional studies are useful in Genetic Epidemiology, because genetic exposures cannot change over time and unquestionably preceded the outcome.
-Cohort Studies
cohort study is “an observational epidemiological study in which subsets of a defined population can be identified who are, have been, or in the future may be exposed or not exposed to a factor or factors hypothesized to influence the occurrence of a given outcome”.
2.Key Concepts in Genetic Epidemiology
-Introduction to Genetic Epidemiology
Genetic epidemiology is the scientific discipline that aims to unravel the role of the genetic determinants in health and disease and their complex interplay with environmental factors. In the past, genetic epidemiology has been particularly successful in mapping genes with large effect sizes at the individual level, for example in monogenic disorders where familial recurrence follows the laws of mendelian inheritance.
With the advent of more high-throughput genotyping technologies and the development of more sophisticated bioinformatics and statistical genetics methodologies, the field of genetic epidemiology has recently focused its attention on dissecting the genetic architecture of common complex diseases. Unlike monogenic diseases, common complex diseases are caused by a large number of genes with small to modest effect sizes and their complex interplay with environmental factors. Large-scale genome-wide and whole genome sequencing association studies (GWAS and WGS) have catalogued a large number of genetic variations that are implicated in complex traits and diseases.
-Molecular Genetics and Variation
Genetics is the study of genes and heredity, the process by which characteristics are passed on from one generation to another. The carrier molecule of an organism’s genetic information is called deoxyribonucleic acid (DNA).
1-From DNA to RNA to Proteins
DNA is a large molecule consisting of two single strands, and each strand is composed of smaller molecules called nucleotides. The nucleotides are composed of a sugar residue (deoxyribose),a phosphate group and a nitrogenous base which can be any of four types: adenine (A), cytosine (C), guanine (G), and thymine (T). The sugar residue and the phosphate group together form the nucleoside and alternating nucleosides form the DNA backbone. Covalent bonds bind bases to the nucleoside in one single strand. Weaker hydrogen bonds bind specifically A with T and G with C (also known as complementary bases) between the two single DNA strands resulting in the formation of a twisted double-stranded (dsDNA) molecule also known as the DNA double helix. Each single strand has different ends oriented in opposite directions termed five primed (5') and three primed (3') ends.
Fig. 1 Structure of a DNA molecule. Basic representation of an unwound DNA
double helix segment depicting the phosphate group (purple circle), the sugar
residue (blue pentagon), and the four different chemical bases (differentially
colored squares). Complementary base pairing occurs between guanine (G) and
cytosine (C) and between adenine (A) and thymine (T).
The DNA sequence is essentially the order of the four bases across the genome and it is written down as letters for one strand only in the 5' to 3' direction, in this example GACC. This linear sequence of DNA is also known as its primary structure. The complementary strand in this case, written in the 3' to 5' direction, would be CTGG (Fig. 1). The length of the DNA is measured in base pairs (bp) so the DNA fragment in the example shown is 4 bp long.
The process of protein synthesis can be summarized in two steps: transcription of a DNA sequence into ribonucleic acid (RNA) and translation of RNA into amino acids which form proteins. During the process of transcription the DNA double helix is unzipped into single strands. A single DNA strand acts as a template for the synthesis of a complementary strand of RNA in the 5' to 3' direction which is catalyzed by the RNA polymerase enzyme. The structure of RNA is similar to the single stranded DNA except that its backbone is composed of a sugar residue called ribose and the chemical base uracil (U) is present instead of T. RNA transcription that leads to proteins occurs in certain regions of the DNA which are transcribed to messenger RNA (mRNA). These regions are known as genes and typically contain alternating segments of sequence called exons, the protein coding sequences, separated by segments of noncoding DNA called introns. mRNA is further edited to make mature mRNA where introns are cut out and exons are spliced. Differential or alternative splicing of exons gives rise to different gene transcripts ensuring that multiple proteins can be coded by one gene.
2-Human Genome and Variation
Nuclear DNA (nDNA) is found in the nucleus of almost every human cell (except for red blood cells) tightly packed in structures called chromosomes. Mitochondrial DNA (mtDNA) which is found in the cell structures known as mitochondria is responsible for providing the energy that the cell needs to function. nDNA encodes for the majority of the genome in eukaryotes; in humans it is 3.3 billion bp long and contains approximately 20,000 genes . nDNA is distributed in 22 pairs of autosomes and in one pair of sex chromosomes which is XY in males and XX in females. One of the pair is derived from the mother and one from the father. All human cells contain two copies of each chromosome and are thus called diploid.
3-The Impact of DNA Variation in Health and Disease
DNA sequence variations are the result of genetic mutations that may be introduced during DNA replication or due to DNA exposure
to damaging agents. Hereditary mutations are passed on from parent to offspring. Mutations are essential for our evolution and our long-term survival. However, a very small percentage of all mutations can also lead to medical conditions of various severities.
For variants that fall in protein-coding genes it is easier to make predictions about their effect on gene function. There is a wide range of databases that describe these functional consequences such as Ensembl and UCSC. For example, non-synonymous variants, those that cause amino acid changes may introduce a premature stop codon leading to a shortened transcript; small insertion/deletions (indels) can change the translational reading frame. These belong to the category of loss of function (LoF) variants that comprise highly deleterious variants responsible for severe diseases.
Non-synonymous, missense variants where the length is preserved can sometimes, but not always, affect the structure
or function of the protein. A very well-known example is sickle-cell anaemia, caused by a missense mutation, A to T, in the
gene coding for the beta-globin chain constituent of hemoglobin. This mutation results in the substitution of glutamic acid to valine (GAG codon changes to GTG); the disease is manifested in homozygous individuals and is caused by aggregation and precipitation of hemoglobin. In heterozygous individuals (known as carriers) 50% of the hemoglobin is still produced so the symptoms are far less severe. Interestingly, the mutation has thought to have arisen because it provides protection to malaria.
Transcription and translation are complex processes regulated by many factors. Briefly, the initiation of transcription is
controlled by promoters, which are DNA elements upstream of the gene where different forms of RNA polymerase and other
associated transcription factors bind.
-DNA Transmission
The first step in the process by which genetic information is transmitted from generation to generation is called meiosis. During this
process a single cell divides to produce four cells containing half the original amount of genetic information.
Meiosisand Recombination
Meiosis is the process of cell division that leads to gametes, sperm, and ovum. A simplistic description of this process is depicted in
Fig. 2 for one homologous chromosome.
Fig. 2 An overview of meiosis. (1) A homologous chromosome of a diploid cell which contains the maternally
derived and paternally derived double-stranded DNA (dsDNA). (2) DNA replication to produce two identical
dsDNA molecules, the sister chromatids. (3) Pairing up of homologous chromosomes. (4) Crossing over and
exchange of DNA segments between homologous chromosomes. (5) First meiotic division—separation of
non-sister chromatids to two diploid cells. (6) Second meiotic division—separation of sister chromatids to four
haploid gametes
In a diploid cell the maternally derived and paternally derived dsDNA of a chromosome undergoes DNA replication (it is duplicated)
to produce two identical dsDNA molecules, the sister chromatids, held together by the centromere. The resulting
homologous chromosomes pair up. At this stage it is possible to exchange different segments of genetic material between homologous
chromosomes leading to the formation of recombinant chromosomes. In the first meiotic division event that follows non-sister
chromatids are separated and distributed in two diploid cells. In the second meiotic division the sister chromatids are separated and distributed in four haploid gametes. Gametes (sperm and ova) fuse together during reproduction to form a zygote diploid cell.
An important aspect of meiosis is that homologous chromosomes are distributed randomly and independently to the gametes.
So there is a 50% probability that a gamete will receive one chromosome from the mother rather than the father and there are 223
distinct gametes that a mother or father will produce.
References
1. Porta M (ed) (2014) A dictionary of epidemiology.
Oxford University Press, Oxford
2. Morton NE (1997) Genetic epidemiology.
Ann Hum Genet 61:1–13
3. Boslaugh SE (2007) Genetic epidemiology. In:
Boslaugh SE (ed) Encyclopedia of epidemiology.
SAGE Publications, Thousand Oaks, pp
417–420
4. Khoury M, Little J, Burke W (2004) Human
genome epidemiology: scope and strategies.
In: Human genome epidemiology. Oxford
University Press, New York, pp 3–16
5. Cordell HJ, Clayton DG (2005) Genetic association
studies. Lancet 366:1121–1131
6. Grimes DA, Schulz KF (2002) Descriptive
studies: what they can and cannot do. Lancet
(London, England). 359:145–149
7. Grimes DA, Schulz KF (2002) Cohort studies:
marching towards outcomes. Lancet (London,
England). 359:341–345
8. Ioannidis JPA, Munafo` MR, Fusar-Poli P et al
(2014) Publication and other reporting biases
in cognitive sciences: detection, prevalence,
and prevention. Trends Cogn Sci 18:235–241
9. Evangelou E, Ioannidis JPA (2013) Metaanalysis
methods for genome-wide association
studies and beyond. Nat Rev Genet
14:379–389
10. Grimes DA, Schulz KF (2002) Bias and causal
associations in observational research. Lancet
(London, England) 359:248–252