This file is indexed.

/usr/share/doc/diveintopython-zh/html/performance_tuning/string_manipulation.html is in diveintopython-zh 5.4b-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   
      <title>18.6.&nbsp;优化字符串操作</title>
      <link rel="stylesheet" href="../diveintopython.css" type="text/css">
      <link rev="made" href="mailto:f8dy@diveintopython.org">
      <meta name="generator" content="DocBook XSL Stylesheets V1.52.2">
      <meta name="keywords" content="Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free">
      <meta name="description" content="Python from novice to pro">
      <link rel="home" href="../toc/index.html" title="Dive Into Python">
      <link rel="up" href="index.html" title="第&nbsp;18&nbsp;章&nbsp;性能优化">
      <link rel="previous" href="list_operations.html" title="18.5.&nbsp;优化列表操作">
      <link rel="next" href="summary.html" title="18.7.&nbsp;小结">
   </head>
   <body>
      <table id="Header" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td id="breadcrumb" colspan="5" align="left" valign="top">导航:<a href="../index.html">起始页</a>&nbsp;&gt;&nbsp;<a href="../toc/index.html">Dive Into Python</a>&nbsp;&gt;&nbsp;<a href="index.html">性能优化</a>&nbsp;&gt;&nbsp;<span class="thispage">优化字符串操作</span></td>
            <td id="navigation" align="right" valign="top">&nbsp;&nbsp;&nbsp;<a href="list_operations.html" title="上一页: “优化列表操作”">&lt;&lt;</a>&nbsp;&nbsp;&nbsp;<a href="summary.html" title="下一页: “小结”">&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3" id="logocontainer">
               <h1 id="logo"><a href="../index.html" accesskey="1">深入 Python :Dive Into Python 中文版</a></h1>
               <p id="tagline">Python 从新手到专家 [Dip_5.4b_CPyUG_Release]</p>
            </td>
            <td colspan="3" align="right">
               <form id="search" method="GET" action="http://www.google.com/custom">
                  <p><label for="q" accesskey="4">Find:&nbsp;</label><input type="text" id="q" name="q" size="20" maxlength="255" value=""> <input type="submit" value="搜索"><input type="hidden" name="domains" value="woodpecker.org.cn"><input type="hidden" name="sitesearch" value="www.woodpecker.org.cn/diveintopython"></p>
               </form>
            </td>
         </tr>
      </table>
      <!--#include virtual="/inc/ads" -->
      <div class="section" lang="zh_cn">
         <div class="titlepage">
            <div>
               <div>
                  <h2 class="title"><a name="soundex.stage4"></a>18.6.&nbsp;优化字符串操作
                  </h2>
               </div>
            </div>
            <div></div>
         </div>
         <div class="abstract">
            <p>Soundex 算法的最后一步是对短结果补零和截短长结果。最佳的做法是什么?</p>
         </div>
         <p>这是目前在 <tt class="filename">soundex/stage2/soundex2c.py</tt> 中的做法:
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits3 = re.sub(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>, digits2)
    <span class='pykeyword'>while</span> len(digits3) &lt; 4:
        digits3 += <span class='pystring'>"0"</span>
    <span class='pykeyword'>return</span> digits3[:4]
</pre></div>
         <p>这里是 <tt class="filename">soundex2c.py</tt> 的表现:
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage2&gt;</tt><span class="userinput">python soundex2c.py</span>
<span class="computeroutput">Woo             W000 12.6070768771
Pilgrim         P426 14.4033353401
Flingjingwaller F452 19.7774882003</span>
</pre></div>
         <p>思考的第一件事是以循环取代正则表达式。这里的代码来自 <tt class="filename">soundex/stage4/soundex4a.py</tt></p>
         <div class="informalexample"><pre class="programlisting">
    digits3 = <span class='pystring'>''</span>
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits2:
        <span class='pykeyword'>if</span> d != <span class='pystring'>'9'</span>:
            digits3 += d
</pre></div>
         <p><tt class="filename">soundex4a.py</tt> 快了吗?是的:
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4a.py</span>
<span class="computeroutput">Woo             W000 6.62865531792
Pilgrim         P426 9.02247576158
Flingjingwaller F452 13.6328416042</span>
</pre></div>
         <p>但是,等一下。一个从字符串去除字符的循环?我们可以用一个简单的字符串方法做到。这便是  <tt class="filename">soundex/stage4/soundex4b.py</tt></p>
         <div class="informalexample"><pre class="programlisting">
    digits3 = digits2.replace(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>)
</pre></div>
         <p><tt class="filename">soundex4b.py</tt> 快了吗?这是个有趣的问题,它取决输入值:
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4b.py</span>
<span class="computeroutput">Woo             W000 6.75477414029
Pilgrim         P426 7.56652144337
Flingjingwaller F452 10.8727729362</span>
</pre></div>
         <p><tt class="filename">soundex4b.py</tt> 中的字符串方法对于大多数名字比循环快,但是对于短小的情况 (很短的名字) 却比 <tt class="filename">soundex4a.py</tt> 略微慢些。性能优化并不总是一致的,对于一个情况快些,却可能对另外一些情况慢些。就此而言,大多数情况将会从改变中获益,所以就改吧,但是别忘了原则。
         </p>
         <p>最后仍很重要的是,让我们检测算法的最后两步:以零补齐短结果和截短超过四字符的长结果。你在  <tt class="filename">soundex4b.py</tt> 中看到的代码就是做这个工作的,但是太没效率了。看一下 <tt class="filename">soundex/stage4/soundex4c.py</tt> 找出原因:
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits3 += <span class='pystring'>'000'</span>
    <span class='pykeyword'>return</span> digits3[:4]
</pre></div>
         <p>我们为什么需要一个 <tt class="literal">while</tt> 循环来补齐结果?我们早就知道我们需要把结果截成四字符,并且我们知道我们已经有了至少一个字符 (直接从 <tt class="varname">source</tt> 中拿过来的起始字符)。这意味着我们可以仅仅在输出的结尾添加三个零,然后截断它。不要害怕重新理解问题,从不太一样的角度看问题可以获得简单的解决方案。
         </p>
         <p>我们丢弃 <tt class="literal">while</tt> 循环后从 <tt class="filename">soundex4c.py</tt> 中获得怎样的速度?太明显了:
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4c.py</span>
<span class="computeroutput">Woo             W000 4.89129791636
Pilgrim         P426 7.30642134685
Flingjingwaller F452 10.689832367</span>
</pre></div>
         <p>最后,还有一件事可以令这三行运行得更快:你可以把它们合并为一行。看一眼 <tt class="filename">soundex/stage4/soundex4d.py</tt></p>
         <div class="informalexample"><pre class="programlisting">
    <span class='pykeyword'>return</span> (digits2.replace(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>) + <span class='pystring'>'000'</span>)[:4]
</pre></div>
         <p><tt class="filename">soundex4d.py</tt> 中把所有代码放在一行可以比 <tt class="filename">soundex4c.py</tt> 稍微快那么一点:
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage4&gt;</tt><span class="userinput">python soundex4d.py</span>
<span class="computeroutput">Woo             W000 4.93624105857
Pilgrim         P426 7.19747593619
Flingjingwaller F452 10.5490700634</span>
</pre></div>
         <p>它非常难懂,而且优化也不明显。这值得吗?我希望你有很好的见解。性能并不是一切。你在优化方面的努力应该与程序的可读性和可维护性相平衡。</p>
      </div>
      <table class="Footer" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td width="35%" align="left"><br><a class="NavigationArrow" href="list_operations.html">&lt;&lt;&nbsp;优化列表操作</a></td>
            <td width="30%" align="center"><br>&nbsp;<span class="divider">|</span>&nbsp;<a href="index.html#soundex.divein" title="18.1.&nbsp;概览">1</a> <span class="divider">|</span> <a href="timeit.html" title="18.2.&nbsp;使用 timeit 模块">2</a> <span class="divider">|</span> <a href="regular_expressions.html" title="18.3.&nbsp;优化正则表达式">3</a> <span class="divider">|</span> <a href="dictionary_lookups.html" title="18.4.&nbsp;优化字典查找">4</a> <span class="divider">|</span> <a href="list_operations.html" title="18.5.&nbsp;优化列表操作">5</a> <span class="divider">|</span> <span class="thispage">6</span> <span class="divider">|</span> <a href="summary.html" title="18.7.&nbsp;小结">7</a>&nbsp;<span class="divider">|</span>&nbsp;
            </td>
            <td width="35%" align="right"><br><a class="NavigationArrow" href="summary.html">小结&nbsp;&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3"><br></td>
         </tr>
      </table>
      <div class="Footer">
         <p class="copyright">Copyright © 2000, 2001, 2002, 2003, 2004 <a href="mailto:mark@diveintopython.org">Mark Pilgrim</a></p>
         <p class="copyright">Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007 <a href="mailto:python-cn@googlegroups.com">CPyUG (邮件列表)</a></p>
      </div>
   </body>
</html>