An algorithm is a step by step method of solving a problem. This post will help you think about algorithm design. Big-O changed the way I thought about programming. In this post we’ll be analyzing 2 algorithms that do the same thing and figure out which ones better using Big O notation and then testing our theory.
Understanding Big-O Notation
“Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.”- Wikipedia In my own words it’s simply a way of thinking about how complex your algorithm or function is.
A few examples:
- O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set. Imagine a function that returns a specific index of an array or list.
- O(log n) When dealing with an O(log n) complexity operation, each cycle should reduce the complexity in relation to input.
- O(n) An O(n) operation’s complexity scales linearly with the number of inputs. If you iterate through all the elements in an array, it is O(n), because the operation is dependent on the number of items in your data. Going through 100 elements in a array will require 100 operations. If you were to add more elements to the list the complexity would scale linearly.
- O(n²) An O(n²) operation’s complexity scales exponentially with the number of inputs. A simple example of an O(n²) is a process with a loop within a loop. A prime example of this is nested for loops. Your complexity is not scaling directly with input, but for input squared.
For our example we have 2 algorithms that given an integer array, outputs all the unique pairs that sum up to a specific value k. A common programming interview question.
def pair_sum_1(arr,k): pairs = set() for idx, n in enumerate(arr): for j in arr[idx + 1:]: if n + j == k: a,b = sorted([n,j]) pairs.add((a,b)) return pairs
def pair_sum_2(arr,k): #sets for tracking seen = set() output = set() for num in arr: target = k-num if target not in seen: seen.add(num) else: output.add(((min(num,target)), max(num,target))) return output
They both do the same thing but which ones better?
The Big O of algorithm 1 is O(n²) meaning that for each input there is operations squared preformed on per input. The operation’s complexity scales exponentially with the number of inputs.
The Big O of Algorithm 2 is O(n) or linear. operation’s complexity scales linearly with the number of inputs.
Lets run a test just to be sure!
Testing our Theory
In order to visualize this lets setup a test. The test will time each function with random integers.
import timeit import random import pandas as pd import matplotlib.pyplot as plt %matplotlib inline #Generate random list with ints from 1-100 rand_list = random.sample(range(1, 101), 100) #generate random int from 1 - 100 rand_int = random.randint(1,101) results =  #time each function and append time to a list for i in range(5000): start_pair_sum_1 = timeit.default_timer() pair_sum_1(rand_list,rand_int) end_pair_sum_1 = timeit.default_timer() - start_pair_sum_1 start_pair_sum_2 = timeit.default_timer() pair_sum_2(rand_list,rand_int) end_pair_sum_2 = timeit.default_timer() - start_pair_sum_2 results.append([end_pair_sum_1,end_pair_sum_2]) #analize with pandas df = pd.DataFrame(data=results, columns=['pair_sum_1', 'pair_sum_2']) plot = df.plot() plot.set_xlabel("Iterations") plot.set_ylabel("Time")
As you can clearly see pair_sum_2 preforms much better over 5000 iterations of random data. Simply removing 1 loop from the algorithm massively increases efficiency.