/usr/share/doc/phylip/html/doc/dollop.html is in phylip-doc 1:3.696+dfsg-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>dollop</TITLE>
<META NAME="description" CONTENT="dollop">
<META NAME="keywords" CONTENT="dollop">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
</HEAD>
<BODY BGCOLOR="#ccffff">
<DIV ALIGN=RIGHT>
version 3.696
</DIV>
<P>
<DIV ALIGN=CENTER>
<H1>Dollop -- Dollo and Polymorphism Parsimony Program</H1>
</DIV>
<P>
© Copyright 1986-2014 by Joseph Felsenstein. All rights reserved.
License terms <a href="main.html#copyright">here</a>.
<P>
This program carries out the Dollo and polymorphism parsimony methods. The
Dollo parsimony method was
first suggested in print by Le Quesne (1974) and was
first well-specified by Farris (1977). The method is named after Louis
Dollo since he was one of the first to assert that in evolution it is
harder to gain a complex feature than to lose it. The algorithm
explains the presence of the state 1 by allowing up to one forward
change 0-->1 and as many reversions 1-->0 as are necessary to explain
the pattern of states seen. The program attempts to minimize the number
of 1-->0 reversions necessary.
<P>
The assumptions of this method are in effect:
<OL>
<LI>We know which state is the ancestral one (state 0).
<LI>The characters are evolving independently.
<LI>Different lineages evolve independently.
<LI>The probability of a forward change (0-->1) is small over the
evolutionary times involved.
<LI>The probability of a reversion (1-->0) is also small, but
still far larger than the probability of a forward change, so
that many reversions are easier to envisage than even one
extra forward change.
<LI>Retention of polymorphism for both states (0 and 1) is highly
improbable.
<LI>The lengths of the segments of the true tree are not so
unequal that two changes in a long segment are as probable as
one in a short segment.
</OL>
<P>
One problem can arise when using additive binary recoding to
represent a multistate character as a series of two-state characters. Unlike
the Camin-Sokal, Wagner, and Polymorphism methods, the Dollo
method can reconstruct ancestral states which do not exist. An example
is given in my 1979 paper. It will be necessary to check the output to
make sure that this has not occurred.
<P>
The polymorphism parsimony method was first used by me, and the results
published (without a clear
specification of the method) by Inger (1967). The method was
independently published by Farris (1978a) and by me (1979). The method
assumes that we can explain the pattern of states by no more than one
origination (0-->1) of state 1, followed by retention of polymorphism
along as many segments of the tree as are necessary, followed by loss of
state 0 or of state 1 where necessary. The program tries to minimize
the total number of polymorphic characters, where each polymorphism is
counted once for each segment of the tree in which it is retained.
<P>
The assumptions of the polymorphism parsimony method are in effect:
<OL>
<LI>The ancestral state (state 0) is known in each character.
<LI>The characters are evolving independently of each other.
<LI>Different lineages are evolving independently.
<LI>Forward change (0-->1) is highly improbable over the length of
time involved in the evolution of the group.
<LI>Retention of polymorphism is also improbable, but far more
probable that forward change, so that we can more easily
envisage much polymorhism than even one additional forward
change.
<LI>Once state 1 is reached, reoccurrence of state 0 is very
improbable, much less probable than multiple retentions of
polymorphism.
<LI>The lengths of segments in the true tree are not so unequal
that we can more easily envisage retention events occurring in
both of two long segments than one retention in a short
segment.
</OL>
<P>
That these are the assumptions of parsimony methods has been documented
in a series of papers of mine: (1973a, 1978b, 1979, 1981b,
1983b, 1988b). For an opposing view arguing that the parsimony methods
make no substantive
assumptions such as these, see the papers by Farris (1983) and Sober (1983a,
1983b), but also read the exchange between Felsenstein and Sober (1986).
<P>
The input format is the standard one, with "?", "P", "B" states
allowed. The options are selected using a menu:
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
Dollo and polymorphism parsimony algorithm, version 3.69
Settings for this run:
U Search for best tree? Yes
P Parsimony method? Dollo
J Randomize input order of species? No. Use input order
T Use Threshold parsimony? No, use ordinary parsimony
A Use ancestral states in input file? No
W Sites weighted? No
M Analyze multiple data sets? No
0 Terminal type (IBM PC, ANSI, none)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Print out steps in each character No
5 Print states at all nodes of tree No
6 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
</PRE>
</TD></TR></TABLE>
<P>
The options U, J, T, A, and M are the usual User Tree, Jumble,
Ancestral States, and Multiple Data Sets options, described either
in the main documentation file or in the Discrete Characters Programs
documentation file. The A (Ancestral States)
option allows implementation of the unordered Dollo parsimony and unordered
polymorphism parsimony methods which I have
described elsewhere (1984b). When the A option is used the ancestor is
not to be counted as one of the species. The O (outgroup) option is not
available since the tree produced is already rooted. Since the Dollo and
polymorphism methods produce a rooted
tree, the user-defined trees required by the U option have two-way forks
at each level.
<P>
The P (Parsimony Method) option is the one that toggles between polymorphism
parsimony and Dollo parsimony. The program defaults to Dollo parsimony.
<P>
The T (Threshold) option has already been described in
the Discrete Characters programs documentation file. Setting T at or below
1.0 but above 0 causes the criterion to become compatibility rather than
polymorphism parsimony, although there is no advantage to using this
program instead of MIX to do a compatibility method. Setting the
threshold value higher brings about an intermediate between
the Dollo or polymorphism parsimony methods and the compatibility method,
so that there is some rationale for doing that. Since the Dollo and
polymorphism methods produces a rooted
tree, the user-defined trees required by the U option have two-way forks
at each level.
<P>
Using a threshold value of 1.0 or lower, but above 0, one can
obtain a rooted (or, if the A option is used with ancestral states of
"?", unrooted) compatibility criterion, but there is no particular
advantage to using this program for that instead of MIX. Higher
threshold values are of course meaningful and provide
intermediates between Dollo and compatibility methods.
<P>
The X
(Mixed parsimony methods) option is not available in this program. The
Factors option is also not available in this program, as it would have no
effect on the result even if that information were provided in the input file.
<P>
Output is standard: a list of equally parsimonious trees, and, if the
user selects menu option 4, a table
of the numbers of reversions or retentions of polymorphism necessary
in each character. If any of the
ancestral states has been specified to be unknown, a table of
reconstructed ancestral states is also provided. When reconstructing
the placement of forward changes and reversions under the Dollo method,
keep in mind that each
polymorphic state in the input data will require one "last minute"
reversion. This is included in the tabulated counts. Thus if we have
both states 0 and 1 at a tip of the tree the program will assume that
the lineage had state 1 up to the last minute, and then state 0 arose in
that population by reversion, without loss of state 1.
<P>
If the user selects menu option 5, a table is printed out after each
tree, showing for each branch whether
there are known to be changes in the branch, and what the states are inferred
to have been at the top end of the branch. If the inferred state is a "?"
there may be multiple equally-parsimonious assignments of states; the user
must work these out for themselves by hand.
<P>
If the A option is used, then the program will
infer, for any character whose ancestral state is unknown ("?") whether the
ancestral state 0 or 1 will give the best tree. If these are
tied, then it may not be possible for the program to infer the
state in the internal nodes, and these will all be printed as ".". If this
has happened and you want to know more about the states at the internal
nodes, you will find it helpful to use Dolmove to display the tree and examine
its interior states, as the algorithm in Dolmove shows all that can be known
in this case about the interior states, including where there is and is not
ambiguity. The algorithm in Dollop gives up more easily on displaying these
states.
<P>
If the U (User Tree) option is used and more than one tree is supplied, the
program also performs a statistical test of each of these trees against the
best tree. This test is a version of the test proposed by
Alan Templeton (1983), evaluated in a test case by me (1985a). It is
closely parallel to a test using log likelihood differences
invented by Kishino and Hasegawa (1989), and uses the mean and variance of
step differences between trees, taken across characters. If the mean
is more than 1.96 standard deviations different then the trees are declared
significantly different. The program
prints out a table of the steps for each tree, the differences of
each from the highest one, the variance of that quantity as determined by
the step differences at individual characters, and a conclusion as to
whether that tree is or is not significantly worse than the best one. It
is important to understand that the test assumes that all the binary
characters are evolving independently, which is unlikely to be true for
many suites of morphological characters.
<P>
If there are more than two trees, the test done is an extension of
the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out
that a correction for the number of trees was necessary, and they
introduced a resampling method to make this correction. In the version
used here the variances and covariances of the sums of steps across
characters are computed for all pairs of trees. To test whether the
difference between each tree and the best one is larger than could have
been expected if they all had the same expected number of steps,
numbers of steps for all trees are sampled with these covariances and equal
means (Shimodaira and Hasegawa's "least favorable hypothesis"),
and a P value is computed from the fraction of times the difference between
the tree's value and the lowest number of steps exceeds that actually
observed. Note that this sampling needs random numbers, and so the
program will prompt the user for a random number seed if one has not
already been supplied. With the two-tree KHT test no random numbers
are used.
<P>
In either the KHT or the SH test the program
prints out a table of the number of steps for each tree, the differences of
each from the lowest one, the variance of that quantity as determined by
the differences of the numbers of steps at individual characters,
and a conclusion as to
whether that tree is or is not significantly worse than the best one.
<P>
If option 6 is left in its default state the trees
found will be written to a tree file, so that they are available to be used
in other programs. If the program finds multiple
trees tied for best, all of these are written out onto the output tree
file. Each is followed by a numerical weight in square brackets (such as
[0.25000]). This is needed when we use the trees to make a consensus
tree of the results of bootstrapping or jackknifing, to avoid overrepresenting
replicates that find many tied trees.
<P>
The algorithm is a fairly simple adaptation of the one used in
the program Sokal, which was formerly in this package and has been
superseded by Mix. It requires two passes through each tree to count the
numbers of reversions.
<P>
<HR>
<P>
<H3>TEST DATA SET</H3>
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
5 6
Alpha 110110
Beta 110000
Gamma 100110
Delta 001001
Epsilon 001110
</PRE>
</TD></TR></TABLE>
<P>
<HR>
<P>
<H3>TEST SET OUTPUT (with all numerical options on)</H3>
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
Dollo and polymorphism parsimony algorithm, version 3.69
Dollo parsimony method
5 species, 6 characters
Name Characters
---- ----------
Alpha 11011 0
Beta 11000 0
Gamma 10011 0
Delta 00100 1
Epsilon 00111 0
One most parsimonious tree found:
+-----------Delta
--3
! +--------Epsilon
+--4
! +-----Gamma
+--2
! +--Beta
+--1
+--Alpha
requires a total of 3.000
reversions in each character:
0 1 2 3 4 5 6 7 8 9
*-----------------------------------------
0! 0 0 1 1 1 0
From To Any Steps? State at upper node
( . means same as in the node below it on tree)
root 3 yes ..1.. .
3 Delta yes ..... 1
3 4 yes ...11 .
4 Epsilon no ..... .
4 2 yes 1.0.. .
2 Gamma no ..... .
2 1 yes .1... .
1 Beta yes ...00 .
1 Alpha no ..... .
</PRE>
</TD></TR></TABLE>
</BODY>
</HTML>
|