Sorting Algorithms

You do not need to memorize all of the following, but you should be familiar with them and spend some time writing up your own implementation of them.

Terms

Stability

Sorting algorithms are considered to be stable if two elements with equal keys remain in the same relative order as they appear in the input and the output.

Adaptive

An adaptive algorithm takes advantage of any existing values that are presorted in the input set and tend to sort faster. The amount of disorder in the input set has a negative effect on the ultimate time taken to complete the sorting of said input set.

In-Place

A sorting algorithm that occupies the same storage space as the input set. Typically, it requires one or two additional temp variables but does not require large chunks of additional memory to store subsets of the input set while sorting.

Online

An online sorting algorithm does not require all of the elements to be present at the beginning of the sorting operation. They can be provided as a stream, or one-by-one and the algorithm must keep additional items sorted as they are provided to the input set.

Insertion sort

More efficient for a relatively smaller number of elements. Similar to selection sort. If provided with a nearly sorted input, it is very efficient. Relatively speaking, each element is examined and checked to see if the element before it is greater in value that it is. If so, it is swapped and thus continues down the array until the correct position is found. It can be more memory r/w intensive than selection sort, so in cases where r/w are slow selection sort may be favored. Advantageous over selection sort and other methods as it is online

  • Worst, Avg: O(n ^ 2)
  • Best case, O(n)
  • Online, adaptive, stable, in-place

Example: 13, 24, 9, 8, 6

We typically start at the beginning of the array and work towards the end of the array, each time, we sort down (towards the 0 index) the value until it is in it’s correct place, such that it is less than the element to it’s right.

Assume a for i = 0, i < lengh; i++ semantic.

  1. i = 13, there is nothing to the left, so i++
  2. i = 24, i > i-1, so nothing to do, i++
  3. i = 9, !(i > i-1), so we swap 9 and 24.
  4. Then we check to see whether the i-1 element from 9 is less than 9
  5. It is not so we swap those two
  6. The first three elements in the array are now, 9, 13, 24, the sorted sub-set.
  7. i = 8 . . . we continue checking the ith element to i-1 until we have sorted the entire array.

Quicksort

A divide and conquer algo. Divide the array into two sub arrays via a partioning algorithm (pick the middle element/value or a random value). Then sort all elements less than the pivot to the left of the pivot, all greater than or equal to the right. Then recursively run the same function on the values to the left of the pivot. Can be problematic in that it can result in resource starvation and stack-overflow problems as the result of it’s recursive nature.

Mergesort

A divide and conquer algo. Recursively split the list until you have lists that are 1 element and merge the result.

  • Worst case: O (n log n)
  • Avg case: O (n log n)
  • Best case: O (n log n)
  • Stable (most implementations)
  • http://en.wikipedia.org/wiki/Merge_sort
  • http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Merge_sort.html
  • http://www.ee.ryerson.ca/~courses/coe428/sorting/mergesort.html

Heapsort

Is an in-place algorithm with a worst, best, and average complexity of O(n log n).

Can be though of as an improved selection sort. It is an improvement because of the use of the semi-sorted heap data structure and that there is no longer a linear approach required to compare all items to each other (not O(n^2)).

It is essentially a two part algorithm. We will assume in this example that this is a max heap and not a min heap

  1. Build the heap
  2. Sort the heap

We capitalize on the fundamental nature of the heap that the largest item is always at the root. And, when we remove the largest item and then reform the remaining heap the next largest item bubbles to the top. We can utilize an array (arr) to store our heap. We divide the array into an unsorted and sorted segments. Initially, the entire array is unsorted.

We begin by creating a pointer (sortedIdx) that will keep track of the index in the array that divides the sorted and unsorted elements. Initially, it points to the last element in the array (array.length – 1). We also create a temporary variable (temp) that we will use to store values that we are in the process of swapping in our array.

In each sort operation we

  1. Copy the value in arr[sortedIdx] to temp, which is one of the smallest elements in the heap
  2. Copy the value at arr[0] (removing the largest element in the array) to arr[sortedIdx], this begins creating the sorted segment of the array
  3. Decrement the sortedIdx value moving the pointer towards the beginning of the array to indicate that all elements after it are already sorted.
  4. Set the value of arr[0], the root of the heap, to the value of temp
  5. Sort the new value downwards until the heap ordering property is satisfied.
  6. Repeat until sortedIdx == 0

After each sort cycle the next largest element is added to the sortedIdx position in the array and we slowly grow the sorted section of the array.

Once we have removed all of the elements from the heap and added them back to the array we are left with a completely sorted array.

BubbleSort

Selection Sort

Similar to insertion sort, however, you must iterate through the entire array each time, to make sure that you have found the lowest next element, whereas in insertion sort, once you have met the criteria of a > b you can stop iterating for that element and move on to the next. An advantage is that it is simple to implement.

Worst, Avg, Best is O(n^2)

The basic algorithm:

  • Start with two sets, a sorted set on the left and an unsorted set on the right
  • Initially, the sorted set is 0 size, and the unsorted set is the rest of the array
  • For each element in the array:
  • Look for the smallest element in the unsorted set and and add it to the end of the sorted set
  • Continue until unsorted set is 0 size (reach the end of the array)
public class SelectionSort {

  public static void selectSort(int[] arr) {
    int temp = 0;
    int currentMin = Integer.MAX_VALUE;
    int currentMinPosition = 0;
  
    // Start at the beginning of the array and look for the smallest value
    // from our current position to the end.
    for (int i = 0; i < arr.length-1; i++) {
      // Reset the minimum
      currentMin = Integer.MAX_VALUE;

      // Look for the smallest value from i to the end
      for (int j = i; j < arr.length; j++) {
        if (arr[j] < currentMin) {
          currentMin         = arr[j];
          currentMinPosition = j;
        }
      }
    
      // Once we have found the current minimum position in the unsorted
      // segment of the array, swap it with the current ith position
      temp   = arr[i];
      arr[i] = arr[currentMinPosition];
      arr[currentMinPosition] = temp;
    }
  }

  public static void printArr(int[] arr) {
    for (int i : arr) {
      System.out.printf("%d ", i);
    }
    System.out.println();
  }
  
  public static void main(String[] args) {
    int[] arr1 = new int[]{6, 6, 2, 9, 7, 15, 1, 31};
    int[][] testData = new int[1][];
    testData[0] = arr1;

    for (int[] t : testData) {
      selectSort(t);
      printArr(t);
    }
  }
}

Topological Sorting

Nodes in a acyclic graph are placed in an order consistent with directed the edges of the graph. Such that for every directed edge a -> b, a appears before b in the result. The graph must be a DAG to be sorted in this fashion.

Nodes are visited (added to the result) only after all of their connected nodes, those upstream from them in the DAG, have been visited. Thus, a node visited later, must come later when topologically sorted.

The algorithm:

  1. Start with a node that does not have any incoming edges, or an indegree of zero.
  2. Add it to the result and remove it from the graph
  3. Then search for another node with a zero indegree
  4. Add it to the result and remove it from the graph
  5. Continue looping until the graph has zero elements remaining OR we cannot find a node with an indegree of zero

There are one of two possible outcomes:

  1. If the graph has zero elements we completed our ordering and it was indeed a DAG
  2. If we cannot find an additional node with an indegree of zero we have a a cyclic graph and cannot generate a topological sorting of it.

The implementation of the algorithm does not require destroying the original graph. We can either utilize a hashmap to keep track of the indegee value of each node, or a node structure that includes that property along with it.

Radix Sort

Counting Sort