I l@ve RuBoard |
17.1 IntroductionCredit: Tim Peters, PythonLabs Algorithm research is what drew me to Python—and I fell in love. It wasn't love at first sight, but it was an attraction that grew into infatuation, which grew steadily into love. And that love shows no signs of fading. Why? I've worked in fields pushing the state of the art, and, in a paradoxical nutshell, Python code is easy to throw away! When you're trying to solve a problem that may not have been solved before, you may have some intuitions about how to proceed, but you rarely know in advance exactly what needs to be done. The only way to proceed is to try things, many things, everything you can think of, just to see what happens. Python eases this by minimizing the time and pain from conception to code: if your colleagues are using, for example, C or Java, it's not unusual for you to try and discard six different approaches in Python while they're still getting the bugs out of their first attempt. In addition, you will have naturally grown classes and modules that capture key parts of the problem domain, simply because you find the need to keep reinventing them when starting over from scratch. A true C++ expert can give you a good run, but C++ is so complex that true experts are very rare. Moderate skill with Python is much easier to obtain, yet much more productive for research and prototyping than merely moderate C++ skill. So if you're in the research business—and every programmer who doesn't know everything occasionally is—you've got a nearly perfect language in Python. How then do you develop the intuitions that can generate a myriad of plausible approaches to try? Experience is the final answer, as we all get better at what we do often, but studying the myriad approaches other people have tried develops a firm base from which to explore. Toward that end, here are the most inspiring algorithm books I've read—they'll teach you possibilities you may never have discovered on your own:
After consorting with the algorithm gods, a nasty practical problem arises back on Earth. When you have two approaches available, how do you measure which is faster? It turns out this is hard to do in a platform-independent way (even in Python) when one approach isn't obviously much faster than the other. One of the nastiest problems is that the resolution of timing facilities varies widely across platforms, and even the meaning of time varies. Your two primary choices for time measurement in Python are time.time and time.clock. time.time shouldn't be used for algorithm timing on Windows, because the timer updates only 18.2 times per second. Therefore, timing differences up to about 0.055 seconds are lost to quantization error (over a span of time briefer than that, time.time may return exactly the same number at each end). On the other hand, time.time typically has the best resolution on Unix-like systems. However, time.time measures wall-clock time. So, for example, it includes time consumed by the operating system when a burst of network activity demands attention. For this reason (among others), it's important to close all nonessential programs when running delicate timing tests and, if you can, shut down your network daemons. time.clock is a much better choice on Windows and often on Unix-like systems. The Windows time.clock uses the Win32 QueryPerformanceCounter facility, and the timer updates more than a million times per second. This virtually eliminates quantization error but also measures wall-clock time, so it is still important to close other programs while timing. time.clock has good and bad aspects on most Unix-like systems. The good side is that it generally measures user time, an account of how much time the CPU spent in the process that calls time.clock, excluding time consumed by other processes. The bad side is that this timer typically updates no more than 100 times per second, so a quantization error can still give misleading results. The best approach to this is to do many repetitions of the basic thing you're timing, so that the time delta you compute is large compared to the timer's updating frequency. You can then divide the time delta by the number of repetitions to get the average time. Overall, there's no compelling best answer here! One useful approach is to start your timing code with a block such as: if 1: from time import clock as now else: from time import time as now Then use now in your timing code and run your timing tests twice, switching the underlying timing function between runs by changing 1 to 0 (or vice versa). Another pitfall is that a Python-level function call is expensive. Suppose you want to time how long it takes to add 1 to 2 in Python. Here's a poor approach that illustrates several pitfalls: def add(i, j): i + j def timer(n): start = now( ) for i in range(n): add(1, 2) finish = now( ) # Return average elapsed time per call return (finish - start) / n Mostly, this program measures the time to call add, which should be obvious. What's less obvious is that it's also timing how long it takes to build a list of n integers, including the time Python takes to allocate memory for each of n integer objects, fiddle with each integer object's reference count, and free the memory again for each. All of this is more expensive than what add's body does. In other words, the thing you're trying to time is lost in the timing approach's overhead. It helps to build the list of timing loop indexes outside the range of the bracketing now calls, which you'll often see done. It helps even more to build the list in a different way, reusing the same object n times. This helps because the reference-count manipulations hit the same piece of memory each time instead of leaping all over memory because the i index variable is bound and unbound as the for loop proceeds: def add(i, j, indices): for k in indices: i + j def timer(n): indices = [None] * n # may be more convenient as a module global start = now( ) add(1, 2, indices) finish = now( ) return (finish - start) / n Putting i+j on the same line as the for clause is another subtle trick. Because they're on the same line, we avoid measuring time consumed by the Python SET_LINENO opcode that the Python compiler would generate (if run without the -O switch) if the two pieces of code were on different lines. There's one more twist I recommend here. No matter how quiet you try to make your machine, modern operating systems and modern CPUs are so complex that it's almost impossible to get the same result from one run to the next. If you find that hard to believe, it's especially valuable to run the timer body inside another loop to accumulate the results from several runs of add: def timer(n_per_call, n_calls): indices = [None] * n_per_call results = [] for i in range(n_calls): start = now( ) add(1, 2, indices) finish = now( ) results.append((finish - start) / n_per_call) results.sort( ) return results print "microseconds per add:" for t in timer(100000, 10): print "%.3f" % (t * 1e6), print Here's output from a typical run on an 866-MHz Windows 98SE box using time.clock: microseconds per add: 0.520 0.549 0.932 0.987 1.037 1.073 1.126 1.133 1.138 1.313 Note that the range between the fastest and slowest computed average times spans a factor of 2.5! If I had run the test only once, I might have gotten any of these values and put too much faith in them. If you try this, your results should be less frightening. Getting repeatable timings is more difficult under Windows 98SE than under any other operating system I've tried, so the wild results above should be viewed as an extreme. More likely (if you're not running Windows 98), you'll see a bimodal distribution with most values clustered around the fast end and a few at the slow end. The slowest result is often computed on the first try, because your machine's caches take extra time to adjust to the new task. As befits a chapter on algorithms, the recipes here have nothing in common. Rather, it's a grab-bag of sundry interesting techniques, ranging from two-dimensional geometry to parsing date strings. Let your natural interests guide you. I have a special fondness for Recipe 17.16: it's a near-trivial wrapper around the standard bisect.insort function. Why is that so cool? On three occasions I've recommended using the same trick to coworkers in need of a priority queue. Each time, when I explained that bisect maintains the queue as a sorted list, they were worried that this would be too inefficient to bear. The attraction of getting a priority queue with no work on their part overcame their reluctance, though, and, when I asked a month later, they were still using it—performance was not a real problem. So if the previous discussion of timing difficulties discouraged you, here's cause for optimism: as noted innumerable times by innumerable authors, the speed of most of your code doesn't matter at all. Find the 10% that consumes most of the time before worrying about any of it. |
I l@ve RuBoard |