Comparing Algorithms With Python and Big-O Notation

An algorithm is a step by step method of solving a problem. This post will help you  think about algorithm design. Big-O changed the way I thought about programming. In this post we’ll be analyzing 2 algorithms that do the same thing and figure out which ones better using Big O notation and then testing our theory. 

Understanding Big-O Notation

“Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.”- Wikipedia In my own words it’s simply a way of thinking about how complex your algorithm or function is.

A few examples:

  1. O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set. Imagine a function that returns a specific index of an array or list.
  2. O(log n) When dealing with an O(log n) complexity operation, each cycle should reduce the complexity in relation to input.
  3. O(n)  An O(n) operation’s complexity scales linearly with the number of inputs. If you iterate through all the elements in an array, it is O(n), because the operation is dependent on the number of items in your data. Going through 100 elements in a array will require 100 operations. If you were to add more elements to the list the complexity would scale linearly. 
  4. O(n²) An O(n²) operation’s complexity scales exponentially with the number of inputs. A simple example of an O(n²) is a process with a loop within a loop. A prime example of this is nested for loops.  Your complexity is not scaling directly with input, but for input squared.

Lets Code

For our example we have 2 algorithms that given an integer array, outputs all the unique pairs that sum up to a specific value k. A common programming interview question.

Algorithm 1:

def pair_sum_1(arr,k):
    pairs = set()
    for idx, n in enumerate(arr):
        for j in arr[idx + 1:]:
            if n + j == k:
                a,b = sorted([n,j])
    return pairs

Algorithm 2:

def pair_sum_2(arr,k):
    #sets for tracking
    seen = set()
    output = set()    
    for num in arr:
        target = k-num        
        if target not in seen:
            output.add(((min(num,target)), max(num,target)))
    return output

They both do the same thing but which ones better?

The Big O of algorithm 1 is O(n²) meaning that for each input there is operations squared preformed on per input. The operation’s complexity scales exponentially with the number of inputs.

The Big O of Algorithm 2 is O(n) or linear. operation’s complexity scales linearly with the number of inputs. 

Lets run a test just to be sure!

Testing our Theory

In order to visualize this lets setup a test. The test will time each function with random integers.

import timeit
import random
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

#Generate random list with ints from 1-100
rand_list = random.sample(range(1, 101), 100)
#generate random int from 1 - 100
rand_int = random.randint(1,101)
results = []
#time each function and append time to a list
for i in range(5000):
    start_pair_sum_1 = timeit.default_timer()
    end_pair_sum_1 = timeit.default_timer() - start_pair_sum_1

    start_pair_sum_2 = timeit.default_timer()
    end_pair_sum_2 = timeit.default_timer() - start_pair_sum_2

#analize with pandas

df = pd.DataFrame(data=results, columns=['pair_sum_1', 'pair_sum_2'])

plot = df.plot()

As you can clearly see pair_sum_2 preforms much better over 5000 iterations of random data. Simply removing 1 loop from the algorithm massively increases efficiency.

5 thoughts on “Comparing Algorithms With Python and Big-O Notation

  1. Do you have any kind of ideas for writing articles?

    That’s where I constantly struggle and I just end up looking empty screen for very long time.

Leave a Reply

Your email address will not be published. Required fields are marked *