/usr/share/doc/phylip/html/doc/dnapars.html is in phylip-doc 1:3.696+dfsg-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>main</TITLE>
<META NAME="description" CONTENT="dnapars">
<META NAME="keywords" CONTENT="dnapars">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
</HEAD>
<BODY BGCOLOR="#ccffff">
<P>
<DIV ALIGN=RIGHT>
version 3.696
</DIV>
<P>
<DIV ALIGN=CENTER>
<H1>Dnapars -- DNA Parsimony Program</H1>
</DIV>
<P>
© Copyright 1986-2014 by Joseph Felsenstein. All rights reserved.
License terms <a href="main.html#copyright">here</a>.
<P>
This program carries out unrooted parsimony (analogous to Wagner
trees) (Eck and Dayhoff, 1966; Kluge and Farris, 1969) on DNA
sequences. The method of Fitch (1971) is used to count the number of
changes of base needed on a given tree.
The assumptions of this method are analogous to those of MIX:
<OL>
<LI>Each site evolves independently.
<LI>Different lineages evolve independently.
<LI>The probability of a base substitution at a given site is
small over the lengths of time involved in
a branch of the phylogeny.
<LI>The expected amounts of change in different branches of the phylogeny
do not vary by so much that two changes in a high-rate branch
are more probable than one change in a low-rate branch.
<LI>The expected amounts of change do not vary enough among sites that two
changes in one site are more probable than one change in another.
</OL>
<P>
That these are the assumptions of parsimony methods has been documented
in a series of papers of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b). For
an opposing view arguing that the parsimony methods make no substantive
assumptions such as these, see the papers by Farris (1983) and Sober (1983a,
1983b, 1988), but also read the exchange between Felsenstein and Sober (1986).
<P>
Change from an occupied site to a deletion is counted as one
change. Reversion from a deletion to an occupied site is allowed and is also
counted as one change. Note that this in effect assumes that a deletion
N bases long is N separate events.
<P>
Dnapars can handle both bifurcating and multifurcating trees. In doing its
search for most parsimonious trees, it adds species not only by creating new
forks in the middle of existing branches, but it also tries putting them at
the end of new branches which are added to existing forks. Thus it searches
among both bifurcating and multifurcating trees. If a branch in a tree
does not have any characters which might change in that branch in the most
parsimonious tree, it does not save that tree. Thus in any tree that
results, a branch exists only if some character has a most parsimonious
reconstruction that would involve change in that branch.
<P>
It also saves a number of trees tied for best (you can alter the number
it saves using the V option in the menu). When rearranging trees, it
tries rearrangements of all of the saved trees. This makes the algorithm
slower than earlier versions of Dnapars.
<P>
The input data is standard. The first line of the input file contains the
number of species and the number of sites.
<P>
Next come the species data. Each
sequence starts on a new line, has a ten-character species name
that must be blank-filled to be of that length, followed immediately
by the species data in the one-letter code. The sequences must either
be in the "interleaved" or "sequential" formats
described in the Molecular Sequence Programs document. The I option
selects between them. The sequences can have internal
blanks in the sequence but there must be no extra blanks at the end of the
terminated line. Note that a blank is not a valid symbol for a deletion.
<P>
The options are selected using an interactive menu. The menu looks like this:
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
DNA parsimony algorithm, version 3.69
Setting for this run:
U Search for best tree? Yes
S Search option? More thorough search
V Number of trees to save? 10000
J Randomize input order of sequences? No. Use input order
O Outgroup root? No, use as outgroup species 1
T Use Threshold parsimony? No, use ordinary parsimony
N Use Transversion parsimony? No, count all steps
W Sites weighted? No
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, ANSI, none)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Print out steps in each site No
5 Print sequences at all nodes of tree No
6 Write out trees onto tree file? Yes
Y to accept these or type the letter for one to change
</PRE>
</TD></TR></TABLE>
<P>
The user either types "Y" (followed, of course, by a carriage-return)
if the settings shown are to be accepted, or the letter or digit corresponding
to an option that is to be changed.
<P>
The S (search) option controls how, and how much, rearrangement is done on the
tied trees that are saved by the program. If the "More thorough search"
option (the default) is chosen, the program will save multiple tied trees,
without collapsing internal branches that have no evidence of change on them.
It will subsequently rearrange on all parts of each of those trees.
If the "Less thorough search" option is chosen, before saving,
the program will collapse
all branches that have no evidence that there is any change on that branch.
This leads to less attempted rearrangement. If the "Rearrange on
one best tree" option is chosen, only the first of the tied trees is used for
rearrangement. This is faster but less thorough. If your trees are likely
to have large multifurcations, do not use the default "More thorough search"
option as it could result in too large a number of trees being saved.
<P>
The N option allows you to choose transversion parsimony, which counts only
transversions (changes between one of the purines A or G and one of the
pyrimidines C or T). This setting is turned off by default.
<P>
The Weights (W) option
takes the weights from a file whose default name is "weights". The weights
follow the format described in the main documentation file, with integer
weights from 0 to 35 allowed by using the characters 0, 1, 2, ..., 9 and
A, B, ... Z.
<P>
The User tree (option U) is read from a file whose default name is <TT>intree</TT>.
The trees can be multifurcating. They must be preceded in the file by a
line giving the number of trees in the file.
<P>
The options J, O, T, M, and 0 are the usual ones. They are described in the
main documentation file of this package. Option I is the same as in
other molecular sequence programs and is described in the documentation file
for the sequence programs.
<P>
The M (multiple data sets option) will ask you whether you want to
use multiple sets of weights (from the weights file) or multiple data sets.
The ability to use a single data set with multiple weights means that
much less disk space will be used for this input data. The bootstrapping
and jackknifing tool Seqboot has the ability to create a weights file with
multiple weights.
<P>
The O (outgroup) option will have no effect if the U (user-defined tree)
option is in effect.
The T (threshold) option allows a continuum of methods
between parsimony and compatibility. Thresholds less than or equal to 1.0 do
not have any meaning and should
not be used: they will result in a tree dependent only on the input
order of species and not at all on the data!
<P>
Output is standard: if option 1 is toggled on, the data is printed out,
with the convention that "." means "the same as in the first species".
Then comes a list of equally parsimonious trees.
Each tree has branch lengths. These are computed using an algorithm
published by Hochbaum and Pathria (1997) which I first heard of from
Wayne Maddison who invented it independently of them. This algorithm
averages the number of reconstructed changes of state over all sites
over all possible most parsimonious placements of the changes of state
among branches. Note that it does not correct in any way for multiple
changes that overlay each other.
<P>
If option 2 is
toggled on a table of the
number of changes of state required in each character is also
printed. If option 5 is toggled
on, a table is printed
out after each tree, showing for each branch whether there are known to be
changes in the branch, and what the states are inferred to have been at the
top end of the branch. This is a reconstruction of the ancestral sequences
in the tree. If you choose option 5, a menu item "." appears which gives you
the opportunity to turn off dot-differencing so that complete ancestral
sequences are shown. If the inferred state is a "?" or one of the IUB
ambiguity symbols, there will be multiple
equally-parsimonious assignments of states; the user must work these out for
themselves by hand. A "?" in the reconstructed states means that in
addition to one or more bases, a deletion may or may not be present. If
option 6 is left in its default state the trees
found will be written to a tree file, so that they are available to be used
in other programs.
<P>
If the U (User Tree) option is used and more than one tree is supplied,
and the program is not told to assume autocorrelation between the
rates at different sites, the
program also performs a statistical test of each of these trees against the
one with highest likelihood. If there are two user trees, this
is a version of the test proposed by
Alan Templeton (1983) and evaluated in a test case by me (1985a). It is
closely parallel to a test using log likelihood differences
due to Kishino and Hasegawa (1989)
It uses the mean and variance of the differences in the number
of steps between trees, taken across sites. If the two
trees' means are more than 1.96 standard deviations different,
then the trees are
declared significantly different.
<P>
If there are more than two trees, the test done is an extension of
the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out
that a correction for the number of trees was necessary, and they
introduced a resampling method to make this correction. In the version
used here the variances and covariances of the sums of steps across
sites are computed for all pairs of trees. To test whether the
difference between each tree and the best one is larger than could have
been expected if they all had the same expected number of steps,
numbers of steps for all trees are sampled with these covariances and equal
means (Shimodaira and Hasegawa's "least favorable hypothesis"),
and a P value is computed from the fraction of times the difference between
the tree's value and the lowest number of steps exceeds that actually
observed. Note that this sampling needs random numbers, and so the
program will prompt the user for a random number seed if one has not
already been supplied. With the two-tree KHT test no random numbers
are used.
<P>
In either the KHT or the SH test the program
prints out a table of the number of steps for each tree, the differences of
each from the lowest one, the variance of that quantity as determined by
the differences of the numbers of steps at individual sites,
and a conclusion as to
whether that tree is or is not significantly worse than the best one.
<P>
Option 6 in the menu controls whether the tree estimated by the program
is written onto a tree file. The default name of this output tree file
is "outtree". If the U option is in effect, all the user-defined
trees are written to the output tree file. If the program finds multiple
trees tied for best, all of these are written out onto the output tree
file. Each is followed by a numerical weight in square brackets (such as
[0.25000]). This is needed when we use the trees to make a consensus
tree of the results of bootstrapping or jackknifing, to avoid overrepresenting
replicates that find many tied trees.
<P>
The program is a straightforward relative of MIX
and runs reasonably quickly, especially with many sites and few species.
<P>
<HR>
<P>
<H3>TEST DATA SET</H3>
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
5 13
Alpha AACGUGGCCAAAU
Beta AAGGUCGCCAAAC
Gamma CAUUUCGUCACAA
Delta GGUAUUUCGGCCU
Epsilon GGGAUCUCGGCCC
</PRE>
</TD></TR></TABLE>
<P>
<HR>
<P>
<H3>CONTENTS OF OUTPUT FILE (if all numerical options are on)</H3>
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
DNA parsimony algorithm, version 3.69
5 species, 13 sites
Name Sequences
---- ---------
Alpha AACGUGGCCA AAU
Beta ..G..C.... ..C
Gamma C.UU.C.U.. C.A
Delta GGUA.UU.GG CC.
Epsilon GGGA.CU.GG CCC
One most parsimonious tree found:
+-----Epsilon
+----------------------------3
+------------2 +-------Delta
| |
| +----------------Gamma
|
1----Beta
|
+---------Alpha
requires a total of 19.000
between and length
------- --- ------
1 2 0.217949
2 3 0.487179
3 Epsilon 0.096154
3 Delta 0.134615
2 Gamma 0.275641
1 Beta 0.076923
1 Alpha 0.173077
steps in each site:
0 1 2 3 4 5 6 7 8 9
*-----------------------------------------
0| 2 1 3 2 0 2 1 1 1
10| 1 1 1 3
From To Any Steps? State at upper node
( . means same as in the node below it on tree)
1 AABGTCGCCA AAY
1 2 yes V.KD...... C..
2 3 yes GG.A..T.GG .C.
3 Epsilon maybe ..G....... ..C
3 Delta yes ..T..T.... ..T
2 Gamma yes C.TT...T.. ..A
1 Beta maybe ..G....... ..C
1 Alpha yes ..C..G.... ..T
</PRE>
<P>
</TD></TR></TABLE>
</BODY>
</HTML>
|