Just How Fast Is Your Algorithm?

When designing and using algorithms, a common question to have is which algorithm is more efficient at accomplishing a certain task.

There are multiple strategies for determining efficiency, and in this essay we will look at a couple of approaches.

Empirical measurements

Perhaps the simplest way of determining the efficiency of an algorithm is by measuring how much time it takes an implementation to run for a given input-size and then compare it to other algorithms.

Most programming languages have some sort of timing function that you can use to time how long it takes to run a function. Python notebooks have the built in timeit magic command which you can use to time the execution of a code cell:

For example, we can time the execution of this implementation of insertion sort on an input of 500 random elements.

%%timeit
def insertion_sort(array):
  for i in range(len(array)):
    for j in range(len(array[:i])):
      if array[j] > array[i]:
        array[j], array[i] = array[i], array[j]
  return array

insertion_sort(to_sort)


# Outputs: 100 loops, best of 3: 7.57 ms per loop.

We see that the best average time per loop is 7.57 ms for 100 loops. Averaging is useful as it muddies out

While this is dead-simple to implement, empirical validation has several drawbacks.

Firstly, it can be difficult to determine how fast an algorithm is when the dataset becomes very large; too large to run on a laptop. This is problematic if you're trying to figure out if your algorithm is fast enough to be run on all your data on a super computer.

Just because your algorithm runs quickly on small datasets it doesn't gurantee that it will scale well to large datasets.

Secondly, it's difficult to compare run-times of different algorithms found by different people as the results are dependent on machine-specific hardware, programming languages, and even compiler versions for the same programming language.

Since we would really like a robust, universal way of determining the run-time of an algorithm, and be able to tell if an algorithm is fast enough before actually running it, empirical validation is seldom used in practice.

Instead, we use analytical reasoning where we look at what an algorithm does, and how it scales to large datasets by counting the number of operations performed rather than measuring the time it takes to run directly.

This solves the problem of reproducability with empirical measurements, it also provides a way of determining if an algorithm will be too expensive to run, even on a super computer, before running it.

Analytical reasoning

Another method assessing the efficiency of an algorithm is by analysing it analytically, but that comes at the cost of being more difficult than just measuring the time it takes to run some code.

One most common method of analytically determining the efficiency of an algorithm is by calculating the asymptotic complexity with big-O.

Big-O

Big-O is concerned with the asymptotic time complexity of an algorithm, or in other words the upper bound for how many steps it will take in comparison to how much data you’re processing. Formally, big-O is defined as:

$$f(n)=O\big(g(n)\big) \implies \exists k \in \mathbb{R}, k>0 \land \exists n_0 \in \mathbb{R}, n_0 \geq 0 \quad \text{s.t.} $$

$$\forall n \geq n_0 \space |\space f(n) \leq k \cdot g(n) $$

What the above equation says is that there exists two constants kk, n_0n0​ on the real number line such that $f(n)$ is at most as large as $k \cdot g(n)$ for all $n$ that’s at least as big as $n_0$.

Or in simpler terms, if after a certain point, the number of steps it takes the algorithm to run is proportional to another function $g$, then big-O of that algorithm can be described as $g$.

As an example, let’s suppose you have an algorithm that runs in $2n$ time, then big-O of that algorithm is $n$ because $3n$ is greater than or equal to $2n$ for all $n$ that’s at least zero.

Mathematically, this can be written as:

$$n=O(2n)$$

$$\because \forall n \in \mathbb{R}, n \geq 0 \space | 2n \leq 3n $$

Or more generally as:

$$\forall k \in \mathbb{R}, k>1 | \forall n \geq 0 | n=O(kn)$$

$$\because \forall k \geq 1 \space | \space kn \leq (k+1)n$$

An efficient algorithm is one that has a low running time for large $n$.

Because big-O is unconcerned by small values, it’s ideal to analytically determine the efficiency of algorithms. With that said, there are other methods of analysing efficiency.

In practice, you generally want to avoid algorithms that run $O(n^2)$ or slower as these algorithms are not practical for large datasets.

Furthermore, searching problems tend to be \log(n)log(n), think binary search, while sorting problems tend to be $n\cdot \log(n)$, think quick sort, where $\log$ is $\log_2$ which is generally assumed in computer science.

Big-O matrix multiplication

To see Big-O in action, let's try to compute the Big-O asymptotic runtime of a naive matrix multiplication algorithm.

We assume the hardware on which we do the matrix multiplication has an integer multiplication operation in its ALU which lets us wrap the time it takes to perform a single multiplication into our fundamental constant.

Assume two matrices $A\in \mathbb{R}^{a \times b}$, and $B\in \mathbb{R}^{c \times d}$ where $b=c$ which is necessary for $AB$ to be defined. We know that:

$AB \in \mathbb{R}^{a \times d}$

Furthermore, due to the way matrix multiplication is defined, every element $AB_{(r,c)}$​, where $r$ denotes the row index, and cc the column index, can be found by:

$$AB_{(r,c)} = \sum_{i=1,\quad j=1}^{i=b, \quad j=c} A_{(r,j)} \cdot B_{(i,c)}$$

Since $b=c$, we can simplify the above by setting $z=b=c$, as follows:

$$AB_{(r,c)} = \sum_{i=1}^{z} A_{(r,i)} \cdot B_{(i,c)}$$

All the elements in the matrix ABAB can now be found by:

$$AB= \begin{bmatrix} AB_{(1,1)} & \cdots & AB_{(1,d)} \\ \vdots & \ddots & \vdots \\ AB_{(a,1)} & \cdots & AB_{(a,d)} \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{z} A_{(1,i)} \cdot B_{(i,1)} & \cdots & \sum_{i=1}^{z} A_{(1,i)} \cdot B_{(i,d)} \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{z} A_{(a,i)} \cdot B_{(i,1)} & \cdots & \sum_{i=1}^{z} A_{(a,i)} \cdot B_{(i,d)} \end{bmatrix}$$

Here, we see that the we have to compute $\sum_{i=1}^{z} A_{(r,i)} \cdot B_{(i,c)}$ for each element of the matrix, so we get:

$$n_{mul} = zad$$

When analyzing matrices, it's a common practice to assume there are the same number of rows and columns. Using this assumption here, we find that the running time for $AB$ can be expressed as:

$$O(n^3)$$

Alternatives to big-O

Big-O is not the only way of analytically determining the complexity of an algorithm.

Another method is by using the tilde (∼) notation.[1] Tilde notation, much like big-O notation, is used to come up with a simple, and succinct expression for the run-time of an algorithm.

Using tilde notation, ∼f(n) represents any quantity that approaches the value of 1 when divided by f(n) as n goes towards infinity, or n→∞. Mathematically, we can write:

$$g(n) \sim f(n) \Rightarrow \lim_{n\rightarrow \infty} \frac{g(n)}{f(n)} = 1$$

For example, suppose we have a function that takes $f(n) = 5n^4 + 3n^2 + 6$ steps to complete. Then we can write:

$$5n^4 \sim f(n)$$

$$\because \lim_{n\rightarrow \infty} \frac{5n^4}{5n^4 + 3n^2 + 6} = 1$$

The main difference between tilde notation and big-O notation is that tilde notation preserves the coefficient of the most significant exponential.

These are just two asymptotic notation schemes. There exists many more. An overview can be found on OEIS's wiki.

Conclusion

We have discussed different methods of determining the efficiency of an algorithm, and have found why analytical approaches such as the popular big-O notation is preferable to empirical measurements.

Furthermore, we have looked at alternatives to big-O specifically tilde notation which preserves the coefficient of the leading exponent.

continue reading

Computational Complexity Of Neural Networks
Why are neural networks so slow?