This file is indexed.

/usr/share/doc/phylip/html/doc/kitsch.html is in phylip-doc 1:3.696+dfsg-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>kitsch</TITLE>
<META NAME="description" CONTENT="kitsch">
<META NAME="keywords" CONTENT="kitsch">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
</HEAD>
<BODY BGCOLOR="#ccffff">
<DIV ALIGN=RIGHT>
version 3.696
</DIV>
<P>
<DIV ALIGN=CENTER>
<H1>Kitsch -- Fitch-Margoliash and Least Squares Methods<BR>
with Evolutionary Clock</H1>
</DIV>
<P>
&#169; Copyright 1986-2014 by Joseph Felsenstein. All rights reserved.
License terms <a href="main.html#copyright">here</a>.
<P>
This program carries out the Fitch-Margoliash and Least Squares methods,
plus a variety of others of the same family, with the assumption that all
tip species are contemporaneous, and that there is an evolutionary clock
(in effect, a molecular clock).  This means that branches of the tree cannot
be of arbitrary length, but are constrained so that the total
length from the root of
the tree to any species is the same.  The quantity minimized is the same
weighted sum of squares described in the Distance Matrix Methods documentation
file.
<P>
The options are set using the menu:
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>

Fitch-Margoliash method with contemporary tips, version 3.696

Settings for this run:
  D      Method (F-M, Minimum Evolution)?  Fitch-Margoliash
  U                 Search for best tree?  Yes
  P                                Power?  2.00000
  -      Negative branch lengths allowed?  No
  L         Lower-triangular data matrix?  No
  R         Upper-triangular data matrix?  No
  S                        Subreplicates?  No
  J     Randomize input order of species?  No. Use input order
  M           Analyze multiple data sets?  No
  0   Terminal type (IBM PC, ANSI, none)?  ANSI
  1    Print out the data at start of run  No
  2  Print indications of progress of run  Yes
  3                        Print out tree  Yes
  4       Write out trees onto tree file?  Yes

  Y to accept these or type the letter for one to change
</PRE>
</TD></TR></TABLE>
<P>
Most of the options are described in the Distance Matrix Programs documentation
file.
<P>
The D (methods) option allows choice between the Fitch-Margoliash
criterion and the Minimum Evolution method (Kidd and Sgaramella-Zonta, 1971;
Rzhetsky and Nei, 1993).  Minimum Evolution (not to be confused with
parsimony) uses the Fitch-Margoliash criterion to fit branch lengths to each
topology, but then chooses topologies based on their total branch length
(rather than the goodness of fit sum of squares).  There is no
constraint on negative branch lengths in the Minimum Evolution method;
it sometimes gives rather strange results, as it can like solutions
that have large negative branch lengths, as these reduce the total
sum of branch lengths!
<P>
Note that the User Trees (used by option U) must be
rooted trees (with a bifurcation at their base).  If you take a user
tree from Fitch and try to evaluate it in Kitsch, it must first be
rooted.  This can be done using Retree.  Of the options
available in Fitch, the O option is
not available, as Kitsch estimates a rooted tree which cannot be
rerooted, and the G option is not 
available, as global rearrangement is the default condition anyway.  It
is also not possible to specify that specific branch lengths of a user tree
be retained when it is read into Kitsch, unless all of them are present.  In
that case the tree should be properly clocklike.  Readers who wonder why
we have not provided the feature of holding some of the user tree branch
lengths constant while iterating others are invited to tell us how they
would do it.  As you consider particular possible patterns of branch
lengths you will find that the matter is not at all simple.
<P>
If you use a User Tree (option U) with branch lengths with Kitsch, and the
tree is not clocklike, when two branch lengths give conflicting positions
for a node, Kitsch will use the first of them and ignore the other.  Thus
the user tree:
<P>
<PRE>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;((A:0.1,B:0.2):0.4,(C:0.06,D:0.01):43);
</PRE>
<P>
is nonclocklike, so it will be treated as if it were actually the tree:
<P>
<PRE>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;((A:0.1,B:0.1):0.4,(C:0.06,D:0.06):44);
</PRE>
<P>
The input is exactly the same as described in the Distance Matrix Methods
documentation file.  The output is a rooted tree, together with the sum of
squares, the number of tree topologies searched, and, if the power P is at
its default value of 2.0, the Average Percent Standard Deviation is also
supplied.  The lengths of the branches of the tree are given in a table,
that also shows for each branch the time at the upper end of the
branch.  "Time" here really means cumulative branch length from the root, going
upwards (on the printed diagram, rightwards).  For each branch, the
"time" given is for the node at the right (upper) end of the branch.   It
is important to realize that the branch lengths are not exactly proportional to
the lengths drawn on the printed tree diagram!  In particular, short
branches are exaggerated in the length on that diagram so that they are
more visible.
<P>
The method may be considered as providing an estimate of the
phylogeny.  Alternatively, it can be considered as a phenetic clustering of
the tip species.  This method minimizes an objective function, the sum of
squares,
not only setting the levels of the clusters so as to do so, but rearranging
the hierarchy of clusters to try to find alternative clusterings that
give a lower overall sum of squares.  When the power option P is set to a
value of <EM>P = 0.0</EM>, so that we are minimizing a simple sum of squares
of the differences between the observed distance matrix and the expected one,
the method is very close in spirit to Unweighted Pair Group Arithmetic Average
Clustering (UPGMA), also called Average-Linkage Clustering.  If the topology of
the tree is fixed and there turn out to be no branches of negative length, its
result should be the same as UPGMA in that case.  But since it tries
alternative topologies and (unless
the N option is set) it combines nodes that otherwise could result in a reversal
of levels, it is possible for it to give a different, and better, result than
simple sequential clustering.  Of course UPGMA itself is available as an
option in program Neighbor.
<P>
The U (User Tree) option requires a bifurcating tree, unlike Fitch, which
requires an unrooted tree with a trifurcation at its base.  Thus the tree
shown below would be written:
<P>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;((D,E),(C,(A,B)));
<P>
If a tree with a trifurcation at the base is by mistake fed into the U option
of Kitsch then some of its species (the entire rightmost furc, in fact) will be
ignored and too small a tree read in.  This should result in an error message
and the program should stop.  It is important to understand the
difference between the User Tree formats for Kitsch and Fitch.  You may want
to use Retree to convert a user tree that is suitable for Fitch into one
suitable for Kitsch or vice versa.
<P>
An important use of this method will be to do a formal statistical test of
the evolutionary clock hypothesis.  This can be done by comparing the sums
of squares achieved by Fitch and by Kitsch, BUT SOME CAVEATS ARE
NECESSARY.  First, the assumption is that the observed distances are truly
independent, that no original data item contributes to more than one of them
(not counting the two reciprocal distances from i to j and from j to i).  THIS
WILL NOT HOLD IF THE DISTANCES ARE OBTAINED FROM GENE FREQUENCIES, FROM
MORPHOLOGICAL CHARACTERS, OR FROM MOLECULAR SEQUENCES.  It may be invalid even
for immunological distances and levels of DNA hybridization, provided that the
use of common standard for all members of a row or column allows an error in
the measurement of the standard to affect all these distances
simultaneously.  It will also be invalid if the numbers have been collected in
experimental groups, each measured by taking differences from a common standard
which itself is measured with error.  Only if the numbers in different cells
are measured from independent standards can we depend on the statistical
model.  The details of the test and the assumptions are discussed in my review
paper on distance methods (Felsenstein, 1984a).  For further and sometimes
irrelevant controversy on these matters see the papers by Farris (1981,
1985, 1986) and myself (Felsenstein, 1986, 1988b).
<P>
A second caveat is that the distances must be expected to rise linearly with
time, not according to any other curve.  Thus it may be necessary to transform
the distances to achieve an expected linearity.  If the distances have an upper
limit beyond which they could not go, this is a signal that linearity may
not hold.  It is also VERY important to choose the power <EM>P</EM> at a value
that results in the standard deviation of the variation of the observed from the
expected distances being the <EM>P/2</EM>-th power of the expected distance.
<P>
To carry out the test, fit the same data with both Fitch and Kitsch,
and record the two sums of squares.  If the topology has turned out the
same, we have <EM>N = n(n-1)/2</EM> distances which have been fit with
<EM>2n-3</EM>
parameters in Fitch, and with <EM>n-1</EM> parameters in Kitsch.  Then the
difference between <EM>S(K)</EM> and <EM>S(F)</EM> has <EM>d<SUB>1</SUB> = n-2</EM>
degrees of freedom.  It is
statistically independent of the value of <EM>S(F)</EM>, which has
<EM>d<SUB>2</SUB> = N-(2n-3)</EM>
degrees of freedom.  The ratio of mean squares
<P>
<PRE>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [S(K)-S(F)]/d<SUB>1</SUB>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;----------------
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     S(F)/d<SUB>2</SUB>
</PRE>
<P>
should, under the
evolutionary clock, have an F distribution with <EM>n-2</EM> and
<EM>N-(2n-3)</EM> degrees of
freedom respectively.  The test desired is that the F ratio is in the upper
tail (say the upper 5%) of its distribution.  If the S (subreplication) 
option is in
effect, the above degrees of freedom must be modified by noting that
N is not <EM>n(n-1)/2</EM> but is the sum of the numbers of replicates of all
cells in the distance matrix read in, which may be either square or
triangular.  A further explanation of the
statistical test of the clock is given in a paper of mine (Felsenstein, 1986).
<P>
The program uses a similar tree construction method to the other programs
in the package and, like them, is not guaranteed to give the best-fitting
tree.  The assignment of the branch lengths for a given topology is a
least squares fit, subject to the constraints against negative branch lengths,
and should not be able to be improved upon.  Kitsch runs more quickly than
Fitch.
<P>
The constant
available for modification at the beginning of the program is 
"epsilon", which defines a small quantity needed in
some of the calculations.  There is no feature saving multiple trees
tied for best,
because exact ties are not expected, except in cases where it should be
obvious from the tree printed out what is the nature of the tie (as when an
interior branch is of length zero).
<P>
<HR>
<P>
<H3>TEST DATA SET</H3>
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>
    7
Bovine      0.0000  1.6866  1.7198  1.6606  1.5243  1.6043  1.5905
Mouse       1.6866  0.0000  1.5232  1.4841  1.4465  1.4389  1.4629
Gibbon      1.7198  1.5232  0.0000  0.7115  0.5958  0.6179  0.5583
Orang       1.6606  1.4841  0.7115  0.0000  0.4631  0.5061  0.4710
Gorilla     1.5243  1.4465  0.5958  0.4631  0.0000  0.3484  0.3083
Chimp       1.6043  1.4389  0.6179  0.5061  0.3484  0.0000  0.2692
Human       1.5905  1.4629  0.5583  0.4710  0.3083  0.2692  0.0000
</PRE>
</TD></TR></TABLE>
<P>
<HR>
<P>
<H3>TEST SET OUTPUT FILE (with all numerical options on)</H3>
<P>
<TABLE><TR><TD BGCOLOR=white>
<PRE>

   7 Populations

Fitch-Margoliash method with contemporary tips, version 3.69

                  __ __             2
                  \  \   (Obs - Exp)
Sum of squares =  /_ /_  ------------
                                2
                   i  j      Obs

negative branch lengths not allowed


Name                       Distances
----                       ---------

Bovine        0.00000   1.68660   1.71980   1.66060   1.52430   1.60430
              1.59050
Mouse         1.68660   0.00000   1.52320   1.48410   1.44650   1.43890
              1.46290
Gibbon        1.71980   1.52320   0.00000   0.71150   0.59580   0.61790
              0.55830
Orang         1.66060   1.48410   0.71150   0.00000   0.46310   0.50610
              0.47100
Gorilla       1.52430   1.44650   0.59580   0.46310   0.00000   0.34840
              0.30830
Chimp         1.60430   1.43890   0.61790   0.50610   0.34840   0.00000
              0.26920
Human         1.59050   1.46290   0.55830   0.47100   0.30830   0.26920
              0.00000


                                           +-------Human     
                                         +-6 
                                    +----5 +-------Chimp     
                                    !    ! 
                                +---4    +---------Gorilla   
                                !   ! 
       +------------------------3   +--------------Orang     
       !                        ! 
  +----2                        +------------------Gibbon    
  !    ! 
--1    +-------------------------------------------Mouse     
  ! 
  +------------------------------------------------Bovine    


Sum of squares =      0.107

Average percent standard deviation =   5.16213

From     To            Length          Height
----     --            ------          ------

   6   Human           0.13460         0.81285
   5      6            0.02836         0.67825
   6   Chimp           0.13460         0.81285
   4      5            0.07638         0.64990
   5   Gorilla         0.16296         0.81285
   3      4            0.06639         0.57352
   4   Orang           0.23933         0.81285
   2      3            0.42923         0.50713
   3   Gibbon          0.30572         0.81285
   1      2            0.07790         0.07790
   2   Mouse           0.73495         0.81285
   1   Bovine          0.81285         0.81285

</PRE>
</TD></TR></TABLE>
</BODY>
</HTML>