DNA 101 is an attempt to
take the extremely complex and confusing subject of Genetics and DNA and
simplify it into layman terms. This page addresses DNA only as it
applies to Y-Chromosome testing and genealogy. Technical terms are
defined in this same context.
This page is broken down into the
following sections:
DNA
Chromosomes
The Y-Chromosome
Test the Y-Chromosome
Reading the Test Results
What Does it Mean
Putting It All Together
Definitions
Links
|
DNA
Deoxyribonucleic acid (DNA) is the
chemical inside the nucleus of all cells that carries the genetic
instructions for making living organisms. A DNA
molecule consists of two strands that wrap around each other to resemble
a twisted ladder. The sides are made of sugar and phosphate molecules.
The “rungs” are made of nitrogen-containing chemicals called bases.
Each strand is composed of one sugar molecule, one phosphate molecule, and a base. Four different bases are present in DNA - adenine
(A), thymine (T), cytosine (C), and guanine (G). The particular order of
the bases arranged along the sugar - phosphate backbone is called the DNA
sequence; the sequence specifies the exact genetic instructions required
to create a particular organism with its own unique traits. Each
strand of the DNA
molecule is held together at its base by a weak bond. The four bases
pair in a set manner: Adenine (A) pairs with thymine (T), while cytosine (C) pairs with
guanine (G). These pairs of bases are known as Base Pairs (bp).
These Base Pairs (bp) are the basis of Y-chromosome testing.
|
Chromosomes
Chromosomes are paired
threadlike "packages" of long segments of DNA contained
within the nucleus of each cell. In humans there are 23 pairs of
chromosomes. In 22 pairs, both members are essentially identical, one
deriving from the individual's mother, the other from the father. The 23rd pair is different. In females this pair has two like
chromosomes called "X". In males it comprises one
"X" and one "Y," two very dissimilar chromosomes. It
is these chromosome differences which determine sex.
The Y-Chromosome
Human sex is determined by the X and Y chromosomes. A female has 2 X-Chromosomes and a male has an X and a Y-Chromosome. When a child is conceived it gets one chromosome from its mother and one chromosome from its father. The chromosome from the mother will always be an X, but the chromosome from the father may be either X or Y. If the child gets the X she will
be a girl, if the child gets the Y he will be a boy.
This Y-Chromosome has certain unique
features:
-
The presence of a Y-Chromosome
causes maleness. This little chromosome, about 2% of a father's
genetic contribution to his sons, programs the early embryo to
develop as a male.
-
It is transmitted from fathers
only to their sons.
-
Most of the Y-Chromosome is
inherited as an integral unit passed without alteration from father
to sons, and to their sons, and so on, unaffected by exchange or any
other influence of the X-Chromosome that came from the mother. It is
the only nuclear chromosome that escapes the continual reshuffling
of parental genes during the process of sex cell production.
It is these unique features that make
the Y-Chromosome useful to genealogists.
Testing the Y-Chromosome
The Y-Chromosome has definable segments of DNA with
known genetic characteristics. These segments are known as Markers.
These markers occur at an identifiable physical location on a chromosome
known as a Locus. Each marker is designated by a number (known as
DYS#), according to international conventions. You will often
find the terms Marker and Locus used interchangeably, but
technically the Marker is what is tested and the Locus is
where the marker is located on the chromosome.
Although there are several types of markers used in DNA
studies, the Y-Chromosome test uses only one type. The marker used is
called a Short Tandem Repeat (STR). STRs are short sequences of
DNA, (usually 2, 3, 4, or 5 base pairs long), that are repeated numerous
times in a head-tail manner. The 16 base pair sequence of
"gatagatagatagata" would represent 4 repeats of the sequence
"gata". These repeats are referred to as Allele. The
variation of the number of repeats of each marker enables discrimination
between individuals.
Reading the Test Results
The table below is a shorten version of the actual
table used to show our DNA test results. It shows 12 of the 25
markers that most of the participants had tested.
| Marker |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
| |
DYS# |
Part
ID# |
3
9
3 |
3
9
0 |
1
9
* |
3
9
1 |
3
8
5
a |
3
8
5
b |
4
2
6 |
3
8
8 |
4
3
9 |
3
8
9
i |
3
9
2 |
3
8
9
ii |
Ancestor
# |
| 3947 |
13 |
26 |
14 |
11 |
12 |
14 |
12 |
12 |
11 |
13 |
13 |
29 |
0001 |
The numbers (1-12) across the top of the table
are the marker numbers. They have no significance other than as an
easy way to refer to the marker. Note: FamilyTree DNA refers to these
numbers as Locus. The second set of numbers across the top of
the matrix are DYS# (the actual marker names).
The numbers down the left side of the table identify
the participant in the DNA project.
The numbers down the right side of the table identify the participant's
oldest known ancestor.
The rest of the numbers are the Allele (the number
repeats) for each participant at the specified marker.
What Does it Mean
An individual's test results have little meaning on
their own. You cannot take these numbers, plug them into some
formula and find out who your ancestors are. The value of the test
results depends on how your results compare to other test results. And
even when you match someone else, it will only indicate that you and the
person you match share a common ancestor. Depending on the number of
markers tested and the number of matches it will indicate with a certain
degree of probability how long ago this common ancestor existed. It will
not show exactly who this ancestor is.
As discussed above, the Y-Chromosome is passed from
father to son. The vast majority of the time the father passes an exact
copy of his Y-Chromosome to his son. This means that the markers of the
son are identical to those of his father. However on rare occasion there
is a mutation or change in one of the markers. The change is
either an insertion or a deletion. An insertion is when an
additional repeat is added to a marker. A deletion is when one of the
repeats is deleted.
Mutations occur at random. This means it is possible
for two distant cousins to match exactly on all markers while two
brothers might not match exactly. Because of the random nature of
mutations we must use statistics and probability to estimate the Time
to the Most Recent Common Ancestor (TMRCA).
The actual calculations of TMRCA are mathematically complex and depend
on knowing the rate of mutation and the true number of mutations. At
this time there is not enough data to accurately determine either of
these factors so certain assumptions have to be made. The discussion of
these assumptions and the actual calculations are beyond the scope of
this webpage. For those wishing to read more about the various models
used, I recommend Time
to Most Recent Common Ancestry Calculator by Bruce Walsh. The
simplest and one of the most commonly used models makes the following
assumptions:
-
Rate of Mutation = .002. This assumes that any
given marker has a .002 chance of mutating with each generation. In
other words, we could expect any marker to mutate once in 500
generations. The rate of .002 is considered conservative and is the
average of a number of studies. It will result in a TMRCA
that is longer than higher mutation rates.
-
Number of mutations: This model
counts any change in a marker as a single mutation. Each marker is
scored as either a match or a non-match. If a marker does not match
it is assumed to be a single mutation. This method a counting
mutations may result in underestimating the TMRCA.
Based on the above assumptions we
derive the cumulative probability table below. This table simply list
the number of generations corresponding to the 50%, 90% and 95% probability
levels for various numbers of matches.
|
Match |
|
50% |
90% |
95% |
95% Confidence Interval |
|
12-0 |
Match exactly at all 12 markers |
14 |
48 |
62 |
1-77 |
|
11-1 |
11 exact matches, 1 mismatch |
37 |
85 |
103 |
5-121 |
|
10-2 |
10 exact matches, 2 mismatch |
61 |
122 |
144 |
14-165 |
|
25-0 |
Match exactly at all 25 markers |
7 |
23 |
30 |
0-37 |
|
24-1 |
24 exact matches, 1 mismatch |
17 |
40 |
48 |
2-57 |
|
23-2 |
23 exact matches, 2 mismatch |
28 |
56 |
66 |
6-75 |
The
TMRCA
for
12
markers
assumes
that
there
are
ONLY
12
markers
available
for
testing.
If
there
are
only
12
markers
and
you
match
12
for
12,
there
is
a
50%
probability
that
you
share
a
common
ancestor
within
14
generations
The
TMRCA
for
25
markers
assumes
that
there
are
ONLY
25
markers
available
for
testing.
If
there
are
only
25
markers
and
you
match
25
for
25,
there
is
a
50%
probability
that
you
share
a
common
ancestor
within
7
generations |
This table tells us that if we match
on 24 of 25 markers there is a 50% probability that the most recent
common ancestor is 17 generations or less, a 90% probability that TMRCA
is 40 generations or less, and a 95% probability that TMRCA is 48
generations or less. The 95% Confidence Interval is the upper and lower range of values that encompass 95%
of the probability for the TMRCA. If we match on 24 of 25
markers, 95% of the possible TMRCA values fall between 2 and 57 generations.
As you can see from the above table
more markers reduce the number of generations to TMRCA. The Chart below shows how increasing the
number of markers tested, decreases the number of generation to TMRCA when
all markers match.

Putting It All Together
DNA testing can be a valuable tool in
genealogical research when it is combined with conventional research.
Test results can be used to confirm a suspected connection between two
families or disprove a connection. Although it is impossible to pinpoint
a common ancestor from the test results alone, with a proper paper trail
you may be able to do so. My own experience with DNA testing demonstrates
this. I have been working with another individual to trace his ancestry.
He had traced his line back to his gr-gr grandfather born in Vermont
1823. My line goes back to 1700 Scotland, through Vermont. I have always
thought our lines were connected but there are holes that could not be
filled and other possible lines to consider. DNA test results showed an
exact 25-marker match, leaving virtually no doubt we shared a common
ancestor. But the results alone could not tell us who this ancestor was.
It was the other information, collected by conventional genealogical
research, that allowed us to determine who our common ancestor had to
be.
Definitions
Allele:
One of the variant forms of a gene at a particular locus, or location,
on a chromosome. Different alleles produce variation in inherited
characteristics. For STR markers, each allele is the number of repeats
of the short base sequence.
Base Pair: Two bases that form
a "rung of the DNA ladder." A DNA nucleotide is made of a
molecule of sugar, a molecule of phosphoric acid, and a molecule called
a base. The bases are the "letters" that spell out the genetic
code. In DNA, the code letters are A, T, G, and C, which stand for the
chemicals adenine, thymine, guanine, and cytosine, respectively. In base
pairing, adenine always pairs with thymine, and guanine always pairs
with cytosine.
Chromosome: One of the
threadlike "packages" of genes and other DNA in the nucleus of
a cell.
DNA: The
chemical inside the nucleus of a cell that carries the genetic
instructions for making living organisms.
DYS#: D=DNA, Y=Y chromosome,
S=a unique DNA segment. A label for genetic markers on the Y chromosome.
Each marker is designated by a number, according to international
conventions. At present, virtually all the DYS designations are given to
STR markers (a class often used in genetic genealogy).
Gene: The functional and
physical unit of heredity passed from parent to offspring. Genes are
pieces of DNA, and most genes contain the information for making a
specific protein.
Genome:
All the DNA contained in an
organism or a cell, which includes both the chromosomes within the
nucleus and the DNA in mitochondria.
Locus: A point in the
genome, identified by a marker, which can be mapped by some means. It
does not necessarily correspond to a gene. A single gene may have
several loci within it (each defined by different markers) and these
markers may be separated in genetic or physical mapping experiments. In
such cases, it is useful to define these different loci, but normally
the gene name should be used to designate the gene itself, as this
usually will convey the most information.
Marker:
Also known as a genetic marker, a
segment of DNA with an identifiable physical location on a chromosome
whose inheritance can be followed. A marker can be a gene, or it can be
some section of DNA with no known function. Because DNA segments that
lie near each other on a chromosome tend to be inherited together,
markers are often used as indirect ways of tracking the inheritance
pattern of genes that have not yet been identified, but whose
approximate locations are known.
Microsatellite:
Repetitive stretches of short
sequences of DNA used as genetic markers to track inheritance in
families.
Mutation: A
permanent structural alteration in DNA.
Short Tandem Repeats (STR):
A genetic marker consisting of multiple copies of an identical DNA
sequence arranged in direct succession in a particular region of a
chromosome. Occasionally, one will mutate by the gain or loss of one
repeat. (Also known as microsatellite)
Links
International
Society of Genetic Genealogy (ISOGG) - The first society
founded to promote the use of DNA testing in genealogy! With links
to a wealth of genetic genealogy tools and information.
Contexo.Info
A website about the foundations of molecular genetics and biology. An
excellent site for those who are looking more details on DNA.
Time
to Most Recent Common Ancestry Calculator by Bruce Walsh. The
goal is to use genetic markers (here on the Y chromosome) to estimate
the TMRCA, the Time to the Most Recent Common Ancestor (MRCA), which is
how many generations the two Y chromosomes are from a common ancestor.
This site explains the various models used to determine TMRCA.
The
National Human Genome Research Institute - The National Human
Genome Research Institute (NHGRI) created the Talking Glossary of
Genetic Terms to help people without scientific backgrounds understand
the terms and concepts used in genetic research.
Human
Genome Project Information - The
Human Genome Project (HGP) is an international effort to discover all
the approximately 30,000 to 35,000 human genes (the human genome), make
them accessible for further biological study, and determine the complete
sequence of the 3 billion DNA subunits (bases).
Primer
on Molecular Genetics - This primer
was prepared by Denise Casey, Human Genome Management Information
System, Oak Ridge National Laboratory, for the 1991-92 DOE Human Genome
Program Report.
Primer
on Molecular Genetics (pdf format) - This is an adobe version of
the primer above.
Why
Y? The Y Chromosome in the Study of Human Evolution, Migration and
Prehistory - Neil Bradman and Mark Thomas of The Centre for
Genetic Anthropology at University College London reveal the power of
modern genetic analysis for exploring the role of fathers in human
history.
Genetics
& Genealogy: Y Chromosome DNA and the Y Line -
by Thomas H. Roderick, PhD, Center for Human Genetics. A discussion of
the Y-Chromosome and its role in DNA as tool for genealogists.
Short
Tandem Repeat DNA Internet DataBase - While the use of STRs
for genetic mapping and identity testing has become widespread among DNA
typing laboratories, there is no single place where information may be
found regarding STR systems. This web site is an attempt to bring
together the abundant literature on the subject in a cohesive fashion to
make future work in this field easier. Facts and sequence information on
each STR system, population data, commonly used multiplex STR systems,
PCR primers and conditions, and a review of various technologies for
analysis of STR alleles have been included in this database.
GENEALOGY-DNA-L
- This mailing list is for anyone with DNA (i.e., anyone!) who would
like to discuss methods and share results of DNA testing as applied to
genealogical research.
Genetic Genealogy and Telephone Tag
- A simplified explanation of how Y-DNA mutates.
To start your own Surname DNA
Project click below

© November 1, 2002,
Blairgenealogy.com
All Rights Reserved
This page may not be copied, reproduced, or displayed without
written permission of the Blair
DNA Project Coordinator.
Links to this page are authorized and welcomed
|