-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path11-profiling-detailed.html
163 lines (154 loc) · 13.7 KB
/
11-profiling-detailed.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Profiling</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Profiling</h1></a>
<h2 class="subtitle">Detailed runtime measurements</h2>
<p>The tools from the previous section can help us decide the question whether we need to optimize in the first place (is the total run time fast enough?) and it can guide when we want to replace code by a better-performing alternative (such as a more specialized numpy function instead of a general built-in function). They do not tell us <em>what</em> to optimize, though.</p>
<p>As a first example, let’s us have a look at the classical <a href="https://en.wikipedia.org/wiki/Fibonacci_number" title="Fibonacci number (wikipedia)">Fibonaci sequence</a> , where each number is the sum of the two preceding numbers. This can be directly written down in a recursive function (note that for simplicity we leave away all error checking, e.g. for negative numbers):</p>
<pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> fibonacci(n):
<span class="kw">if</span> n < <span class="dv">2</span>:
<span class="kw">return</span> n <span class="co"># fibonacci(0) == 0, fibonacci(1) == 1</span>
<span class="kw">else</span>:
<span class="kw">return</span> fibonacci(n - <span class="dv">2</span>) + fibonacci(n - <span class="dv">1</span>)</code></pre>
<p>This seems to work fine, but the runtime increases with <code>n</code> in a dramatic fashion:</p>
<pre class="sourceCode python"><code class="sourceCode python">%time factorial(<span class="dv">10</span>)</code></pre>
<pre><code>CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 52.7 µs
89</code></pre>
<pre class="sourceCode python"><code class="sourceCode python">%time factorial(<span class="dv">20</span>)</code></pre>
<pre><code>CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 9.63 ms
10946</code></pre>
<pre class="sourceCode python"><code class="sourceCode python">%time factorial(<span class="dv">30</span>)</code></pre>
<pre><code>CPU times: user 696 ms, sys: 0 ns, total: 696 ms
Wall time: 695 ms
1346269</code></pre>
<pre class="sourceCode python"><code class="sourceCode python">%time factorial(<span class="dv">35</span>)</code></pre>
<pre><code>CPU times: user 7.49 s, sys: 24 ms, total: 7.52 s
Wall time: 7.52 s
14930352</code></pre>
<p>To get an idea what is going on, we can use <code>%prun</code>, which runs a command with Python’s built-in profiler:</p>
<pre class="sourceCode python"><code class="sourceCode python">% prun factorial(<span class="dv">35</span>)</code></pre>
<pre class="output"><code>29860706 function calls (4 primitive calls) in 10.621 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
29860703/1 10.621 0.000 10.621 10.621 <ipython-input-6-b471b8bf6ddb>:1(factorial)
1 0.000 0.000 10.621 10.621 {built-in method builtins.exec}
1 0.000 0.000 10.621 10.621 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}</code></pre>
<p>The output gives us three pieces of information for every function called during the execution of the command: the number of times the function was called (<code>ncalls</code>), the total time spend in that function itself (<code>tottime</code>) and the time spend in that function, including all time spend in functions called by that function (<code>cumtime</code>). It is not surprising that all of the time is spend in the <code>factorial</code> function (after all, that’s the only function we have) but the function got called 29860703 times! There is no way we are going to get a decent performance from this function without changing the approach fundamentally.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="a-better-fibonacci-sequence"><span class="glyphicon glyphicon-pencil"></span>A better Fibonacci sequence</h2>
</div>
<div class="panel-body">
<p>Do you know of a better way to write the Fibonacci function? Can you imagine specific ways of using that function where you would prefer yet another approach?</p>
</div>
</section>
<p>Optimizing a calculation by using a fundamentally different approach is called “algorithmic optimization” and it is the potentially most powerful way to increase the performance of a program. Whenever the runtime of a program appears to be slow, the first check should be whether there is a function that takes a lot of time and is called more often than expected (e.g. we calculate a measure on 1000 values and expect a function to be called about a 1000 times as well but it is called 1000*1000 times instead).</p>
<p>The output of a profiler run can easily become overwhelming in bigger projects. It can therefore be useful to use a graphical tool that represents this information in a more accessible and interactive way. One such tool is <em><a href="https://jiffyclub.github.io/snakeviz/">snakeviz</a></em> that can also be used directly from ipython or a jupyter notebook.</p>
<p>Let’s have a another look at the <code>equalize</code> function we used earlier:</p>
<pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> equalize(image, n_bins=<span class="dv">256</span>):
bins = numpy.linspace(<span class="dv">0</span>, <span class="dv">1</span>, n_bins, endpoint=<span class="ot">True</span>)
hist, bins = numpy.histogram(image.flatten(), bins=bins, density=<span class="ot">True</span>)
cdf = hist.cumsum() / n_bins
<span class="co"># Invert the CDF by using numpy's interp function</span>
equalized = numpy.interp(image.flatten(), bins[:-<span class="dv">1</span>], cdf)
<span class="co"># All this was performed on flattened versions of the image, reshape the</span>
<span class="co"># equalized image back to the original shape</span>
<span class="kw">return</span> equalized.reshape(image.shape)</code></pre>
<p>We’ll again load our example image:</p>
<pre class="sourceCode python"><code class="sourceCode python">image = plt.imread(<span class="st">'Unequalized_Hawkes_Bay_NZ.png'</span>)[:, :, <span class="dv">0</span>]</code></pre>
<p>With <em>snakeviz</em> installed, we can load the <code>snakeviz</code> magic:</p>
<pre class="sourceCode python"><code class="sourceCode python">%load_ext snakeviz</code></pre>
<p>We can then use <code>%snakeviz</code> in the same way we previously used <code>%prun</code> and it will show us a graphical representation of where the time was spent:</p>
<pre class="sourceCode python"><code class="sourceCode python">%snakeviz equalize(image)</code></pre>
<div class="figure">
<img src="img/snakeviz_screenshot.png" alt="Part of the snakeviz output: A sunburst plot showing the runtime spend in various parts of the function call. The center corresponds to the function we are profiling, each layer shows the relative amount of time spend in functions called by functions in the previous layer." />
<p class="caption">Part of the snakeviz output: A “sunburst” plot showing the runtime spend in various parts of the function call. The center corresponds to the function we are profiling, each layer shows the relative amount of time spend in functions called by functions in the previous layer.</p>
</div>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="snakeviz-without-ipythonjupyter-notbook"><span class="glyphicon glyphicon-pushpin"></span>snakeviz without ipython/jupyter notbook</h2>
</div>
<div class="panel-body">
<p>We can save the profiling information of a Python script to disk:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">python</span> -m cProfile -o filename myscript.py</code></pre>
<p>We can then launch snakeviz on the stored information:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">snakeviz</span> filename</code></pre>
</div>
</aside>
<p>This representation is useful for a global overview of the runtime of the various functions in a project. It might uncover functions that do not seem to be worth optimizing because of their short runtime but are in fact called from various places so that their total runtime is significant.</p>
<p><code>%prun</code> and <code>%snakeviz</code> give information about which functions take up the most time, but they are less useful to see optimization potential in single functions. Another type of profiling is line-based profiling which measures the runtime of every line in one or more functions. This functionality is provided by the <em><a href="https://github.com/rkern/line_profiler">line_profiler</a></em> package which also provides a magic extension for ipython and the jupyter notebook:</p>
<pre class="sourceCode python"><code class="sourceCode python">%load_ext line_profiler</code></pre>
<p>The magic command is <code>%lprun</code> and can be used in a similar way to <code>%prun</code> and <code>%snakeviz</code>, but in contrast to those other profiling functions it needs the specification of the functions of interest in addition:</p>
<pre class="sourceCode python"><code class="sourceCode python">%lprun -f equalize equalize(image)</code></pre>
<pre class="output"><code>Timer unit: 1e-06 s
Total time: 0.013282 s
File: <ipython-input-96-437a650e3f63>
Function: equalize at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def equalize(image, n_bins=256):
2 1 79 79.0 0.6 bins = numpy.linspace(0, 1, n_bins, endpoint=True)
3 1 8160 8160.0 61.4 hist, bins = numpy.histogram(image.flatten(), bins=bins, density=True)
4 1 18 18.0 0.1 cdf = hist.cumsum() / n_bins
5 # Invert the CDF by using numpy's interp function
6 1 5006 5006.0 37.7 equalized = numpy.interp(image.flatten(), bins[:-1], cdf)
7
8 # All this was performed on flattened versions of the image, reshape the
9 # equalized image back to the original shape
10 1 19 19.0 0.1 return equalized.reshape(image.shape)</code></pre>
<p>Note that you can specify several <code>-f function_name</code> arguments to get the information for more than one function with a single <code>%lprun</code> call. For each line, the output specifies how often the line was executed (“Hits”), how long the execution of that line took in total (“Time”) and for each time it was executed (“Per Hit”), and finally how much of the total time of the function was spent in the respective line.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="line-profiling-without-ipythonjupyter-notbook"><span class="glyphicon glyphicon-pushpin"></span>line profiling without ipython/jupyter notbook</h2>
</div>
<div class="panel-body">
<p>To profile a function with the line profiler, you’ll have to annotate it with the <code>@profile</code> decorator. You then use <code>kernprof</code> instead of <code>python</code> to run the script (using <code>python</code> will raise an error because it does not know about <code>@profile</code>!):</p>
<pre class="sourceCode bash"><code class="sourceCode bash"><span class="kw">kernprof.py</span> -l myscript.py</code></pre>
<p>The profile will be saved in a file <code>myscript.py.lprof</code> and can then be printed with:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">python</span> -m line_profiler myscript.py.lprof</code></pre>
</div>
</aside>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/paris-swc/python-testing-debugging-profiling">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
<script src='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
</body>
</html>