Measuring and estimating execution time behaviour¶

We write programs to solve computational problems. In this section we are interested not so much in what the program calculates but instead in something that we can observe from the program execution: how much time execution takes.

Experiments¶

We start by designing experiments that expose the execution time behaviour of programs. We borrrow ideas from the book Introduction to Programming in Python.

Notice that the experiments we describe are actually implemented as programs that generate data for us to analyse and draw conclusions:

We start with a program \(P\) that solves a problem,
we then write a program that uses \(P\) to solve many instances of the problem to be able to collect data about the execution time.
We analyse the data to draw conclusions.

We hope that this approach contributes to develop your programming skills and that you get to know fragments of Python that you might not encounter otherwise.

What it is that we want to observe in order to collect data?¶

We are interested in time. We want quantitative measurments of the time it takes for a program to execute: measure the execution time (running time) of a program. In Python there is support for doing so in the module time documented in time. We are interested in functions that measure time our program is running. We believe the closest we can get to this is the function

process_time()

that, according to the documentation returns a float with the value (in fractional seconds) of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep. It is process-wide by definition. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid.

We can also use

process_time_ns()

that returns an integer with the number of nanoseconds. In both cases, we need to establish a starting point for the measurement and then take the difference with the point where the measurement should end.

Here is a first example with two ways of calculating the sum of all squares up to a bound:

\[ \sum_{i = 0}^{i < bound}{i ^2} \]

In the first version we use i * i to calculate \(i^2\):

value1 = 0
for i in range(bound):
    value1 += i * i

In the second version we use i ** 2 instead:

value2 = 0
for i in range(bound):
    value2 += i ** 2

In both versions we have added code to also calculate the running time:

import time

bound = 10000000

start1 = time.process_time_ns() 
value1 = 0
for i in range(bound):
    value1 += i * i
running_time1 = time.process_time_ns() - start1

start2 = time.process_time_ns() 
value2 = 0
for i in range(bound):
    value2 += i ** 2
running_time2 = time.process_time_ns() - start2

print(value1, running_time1)
print(value2, running_time2)

333333283333335000000 1118811000
333333283333335000000 2693223000

When you test this program you will observe two things:

Of course value1 is equal to value2 for a given bound. But the running times are not the same: the fragment that uses ** 2 takes longer time.
Running the program several times for a given bound yields different values of running time. This is due to the fact that the process that is measured is involved in more than just our code cell, it is running this notebook which involves autosaving and a lot of other stuff.
If we increase the bound then the execution time increases.

We now program a function that can allow us to compare the two execution times for different bounds. Observe that we use process_time() instead of process_time_ns() because we will print the quotient.

def running_time_experiment_sum_squares(initial_bound, step, times):
    
    bound = initial_bound

    for t in range(times):
        
        start1 = time.process_time() 
        value1 = 0
        for i in range(bound):
            value1 += i * i
        running_time1 = time.process_time() - start1

        start2 = time.process_time() 
        value2 = 0
        for i in range(bound):
            value2 += i ** 2
        running_time2 = time.process_time() - start2
        
        print(bound, 
              running_time1, 
              running_time2, 
              running_time2 / running_time1)
        
        bound *= step

# Start with a bound of 1000, double the bound each time and do it 15 times.

running_time_experiment_sum_squares(1000, 2, 15)

7.599999999996498e-05 0.00023299999999970566 3.0657894736817504
0.00015300000000006975 0.00047099999999922204 3.0784313725425316
0.0003150000000005093 0.0009579999999997924 3.0412698412642647
0.0006129999999995306 0.0019260000000000943 3.1419249592195255
0.0012410000000002697 0.0038469999999994897 3.0999194198216387
0.002500000000000391 0.007781999999999734 3.1127999999994067
0.004903999999999797 0.015570000000000306 3.1749592169659357
128000 0.009997999999999507 0.031024999999999636 3.1031206241249416
256000 0.019999000000000322 0.061949999999999505 3.0976548827440626

512000 0.03957700000000042 0.1240319999999997 3.1339414306288598

1024000 0.08024500000000057 0.2468130000000004 3.075743036949326

2048000 0.1587670000000001 0.4919779999999996 3.098742181939567

4096000 0.31739300000000004 0.9833259999999999 3.098133859284861

8192000 0.6316429999999995 1.980581 3.13560191437252

16384000 1.2714400000000001 3.952271999999999 3.108500597747435

In my system the results show that the use of ** 2 makes the program about 3 times slower compared to using *. This relation does not seem to change as the bound grows.

The total execution time of each fragment grows when the bound grows, but it grows at the same rate for both fragments.

Is it always like this? Well, no! Lets try using + and a for loop instead of times in yet another fragment in our experiment:

def running_time_experiment_sum_squares_v2(initial_bound, step, times):
    
    bound = initial_bound

    for t in range(times):
        
        start1 = time.process_time() 
        value1 = 0
        for i in range(bound):
            value1 += i * i
        running_time1 = time.process_time() - start1

        start2 = time.process_time() 
        value2 = 0
        for i in range(bound):
            value2 += i ** 2
        running_time2 = time.process_time() - start2
        
        start3 = time.process_time() 
        value3 = 0
        for i in range(bound):
            sq = 0
            for j in range(i):
                sq += i
            value3 += sq
        running_time3 = time.process_time() - start3
        
        print(bound, 
              running_time1, 
              running_time2, 
              running_time3, 
              running_time2 / running_time1, 
              running_time3 / running_time1)
        
        bound *= step

# Start with a bound of 1000, double the bound each time and do it 5 times.

running_time_experiment_sum_squares_v2(1000, 2, 5)

1000 7.899999999949614e-05 0.00023200000000045407 0.022548999999999708 2.9367088607839715 285.43037974865223
2000 0.00015500000000123748 0.00046999999999997044 0.0941409999999987 3.0322580644917294 607.3612903177233

4000 0.0003170000000007889 0.000968000000000302 0.38250599999999935 3.0536277602457194 1206.6435331200234

8000 0.0006160000000008381 0.0019290000000005136 1.5548649999999995 3.13149350649008 2524.1314935030587

16000 0.0012259999999990612 0.003955000000001263 6.239272 3.2259380097914288 5089.128874392151

I hope you can observe that the third approach is not only much slower: it is also the case that when the bound grows the execution time grows much faster! In my system, for a bound of 1000 it is 300 times slower and for a bound of 16000 it is 5000 times slower!

This should make you curious about exploring execution time as a function of some characteristic of the input that measures the problem size.