Search Algorithms

Glossary

  • Exponential
  • Polynomial
  • Quadratic

An excellent video explaining Dynamic Programming and how to think through search algorithms.

TODO: Add the following:

  • Linear Search
  • Binary Search
  • Jump Search
  • Interpolation Search
  • Exponential Search
  • Sublist Search (Search a linked list in another list)
  • Fibonacci Search

https://en.wikipedia.org/wiki/Search_algorithm

Trees and Tree Traversal

There are many different tree implementations. The following will use binary search for purposes of illustrating the various methods of traversing trees.

A binary search tree is a data structure that abides by the following rules:

  1. The left sub-tree of each node contains ONLY nodes with keys of a lesser value than the node itself
  2. The right sub-tree of each node contains ONLY nodes with keys of a greater value than the node itself
  3. Each node has at most two children or edges
  4. The left and right sub-trees of each node must also follow rule 1, 2, and 3
  5. There are no duplicate nodes

Search is O(log n) on average, but O(n) in the worst case scenario of an unbalanced tree when it then behaves similarly to a linked list.

Searching requires visiting each node a single time and can be done in different directions with different algorithms. The order determines when the value is extracted from each node. The search uses corecursion which starts at a base case and then builds a data set. Whereas recursion breaks down a data set to its base.

Two different ways to traverse the tree are:

  • Depth First Search: DFS
  • Breadth First Search: BFS

DFS searches typically utilize an additional stack to store data during the tree traversal. BFS uses a queue. The way that I remember it is that a BFS starts with the letter B and so does the word barbeque. The one with the “barbeque” uses a queue. The other (DFS) uses a stack. The stack can be the call stack when using a recursive algorithm. In some cases you MUST use recursion especially when your algorithm requires the addition or aggregation of data returned from each level in the tree.

The order in which we traverse is Preorder, Inorder, and Postorder.

The key to the order is the order in which the root node, in relation to it’s children, left and right, are visited. NOT the root of the whole tree, but each individual root branch for each pair of potential children.

  • Preorder: The root node is visited first before any of its children; (root, left, right)
    • Typically used to copy trees
  • Inorder: The root node is visited after its left tree is traversed; (left, root, right)
    • Usually used in a binary search
    • If it is a balanced tree you will get the median value of the set in the middle of the traversal
    • Will return a sorted set of values
  • Postorder: The root node is visited last; (left, right, root)
    • Typically used when deleting trees

Given the following tree,

      5
     / \
    3   6
   / \   \
  2   4   7

and assuming that we will add our results to an array as we go.

Each node has the following structure (in Java).

public class Node {
    Node left;
    Node right;
    int value;
    boolean visited;
}

Preorder Traversal

The following is a non-recursive algorithm for traversing the tree.

  1. Start at the root, 5
  2. Mark it as visited, add it to the stack and the result
    1. Result: 5
    2. Stack: 5
  3. Look at the child nodes of 5 that have not been visited and select the least: 3
  4. Mark it as visited, add it to the stack and the result
    1. Result: 5, 3
    2. Stack: 5, 3
  5. Look at the child nodes of 3, that have not been visited and select the least: 2
  6. Mark it as visited, add it to the stack and the result
    1. Result: 5, 3, 2
    2. Stack: 5, 3, 2
  7. Look at the child nodes of 2 that have not been visited and select the least: There are none so:
  8. Pop 2 off the stack
    1. Stack: 5, 3
  9. 2 has already been visited and does not have any children
  10. Look at the next item on the stack: 3
  11. Look at the child nodes to 3 that have not been visited and select the least: 4
  12. Mark it as visited, add it to the stack and the result.
    1. Result: 5, 3, 2, 4
    2. Stack: 5, 3, 4
  13. Look at the child nodes of 4 that have not been visited and select the least: There are none so:
  14. Pop 4 off the stack
    1. Stack: 5, 3
  15. Look at the next item on the stack: 3
  16. Look at the child nodes to 3 that have not been visited and select the least: There are none so:
  17. Pop 3 off the stack
    1. Stack: 5
  18. Look at the next item on the stack: 5
  19. Look at the child nodes to 5 that have not been visited and select the least: 6
  20. Mark it as visited, add it to the stack and the result.
    1. Result: 5, 3, 2, 4, 6
    2. Stack: 5, 6
  21. Look at the child nodes of 6 that have not been visited and select the least: 7
  22. Mark it as visited, add it to the stack and the result.
    1. Result: 5, 3, 2, 4, 6, 7
    2. Stack: 5, 6, 7
  23. Look at the child nodes of 7 that have not been visited and select the least: There are none so:
  24. Pop 7 off the stack
    1. Stack: 5, 6
  25. Look at the next item on the stack: 6
  26. Look at the child nodes of 6 that have not been visited and select the least: There are none so:
  27. Pop 6 off the stack
    1. Stack: 5
  28. Look at the next item on the stack: 5
  29. Look at the child nodes of 5 that have not been visited and select the least: There are none so:
  30. Pop 5 off the stack
    1. Stack: <empty>
  31. Traversal is complete

Preorder result: 5, 3, 2, 4, 6, 7

Inorder Traversal

  1. Start with the root node 5 and add it to the stack
    1. Result:
    2. Stack: 5
  2. Look for a left node of 5 check that it exists and that it is not visited.
    1. Find 3, add it to the stack, and continue the loop
    2. Result:
    3. Stack: 5, 3
  3. Look at the next item on the stack: 3
  4. Look for a left node of 3, check that it exists and that it is not visited.
    1. Find 2, add it to the stack, and continue the loop
      1. Result:
      2. Stack: 5, 3, 2
  5. Look at the next item on the stack: 2
  6. Look for a left node of 2, check that it exists and that it is not visited.
    1. There will not be a left node
  7. Since we are at the bottom of the left side of this part of the tree, mark 2 as visited and add it to the result and pop it off the stack.
    1. Result: 2
    2. Stack: 5, 3
  8. Look for a right node of 2, check that it exists and that it is not visited.
    1. There will not be a right node
  9. Look at the next item on the stack: 3
  10. Look for a left node of 3, check that it exists and that it is not visited.
    1. There will be a left node, but it will already be visited
  11. Since we have already visited the left tree of 3 mark it as visited add it to the result and pop it off the stack
    1. Result: 2, 3
    2. Stack 5
  12. Look for a right node of 3, check that it exists and that it is not visited.
    1. Find 4, add it to the stack and continue the loop
      1. Result: 2, 3
      2. Stack: 5, 4
  13. Look at the next item on the stack: 4
  14. Look for a left node of 4, check that it exists and that it is not visited.
    1. There will not be a left node
  15. Since we are at the bottom of the left side of this part of the tree, mark 4 as visited and add it to the result and pop it off the stack.
    1. Result: 2, 3, 4
    2. Stack: 5
  16. Look at the next item on the stack: 5
  17. Continue by looking at it’s left children. There will be none that are visited. So it is added to the result, popped off the stack and the process continues with the right sub-tree.

Result: 2, 3, 4, 5, 6, 7

Postorder Traversal

In postorder traversal the algorithm requires that we visit the left and right sub trees first, then the root node.

Result: 2, 4, 3, 7, 6, 5

There is only one order, or kind, of breadth first search which is a level traversal.

BFS typically utilizes a queue during traversal of the tree. Given the following tree:

        A
      / | \
     B  C  D
   / |      \
  E  F        G
  1. Start with the root, A and add it to the queue
  2. Look at the next item in the queue, A, and mark it as visited and add it to the result
    1. Result: A
    2. Queue: A
  3. Iterate over each of it’s child nodes that have not been visited from least value to greatest value. For each
    1. Mark as visited, add to the result, and add to the queue
    2. Result: A, B, C, D
    3. Queue: D, C, B, A
  4. When there are no longer any children for A remove it from the queu
    1. Result: A, B, C, D
    2. Queue: D, C, B
  5. Return to step 2, this time, the next item in the queue will be B. Continue following the steps until the queue no longer has any elements in it.

Result: A, B, C, D, E, F, G

Graphs

In general, a graph consists of nodes which contain data that are connected by edges which can contain a weight or a value.

Shortest Path Searches

Typically, a breadth-first search is the algorithm used to find the shortest path. Of course, you could get lucky with a depth-first search, but you could also go a very long way down the wrong path before bottoming out. Thinking about a more real-world use-case such as the shortest path on a map between two cities you certainly would use a DFS because otherwise, when searching from New York to San Francisco you could end up at the tip of South America before figuring out that you have bottomed out.

Dijkstra’s Shortest Path

TODO: Clean up this explanation

DFS (stacks)
Radix sort
Dijkstra’s Algorithm

public class Node {
private boolean complete; // Have we finished computing the shortest path?
private Node via; // The node from which is the current shortest path
private int distance; // The shortest distance from we we can reach this node
}

Set done;
Set fringe;

Heart of the algo is the current distance from the source to each of the destination nodes

. Start by keeping a set of each of the vertexes:
. Mark the current distance to infinity (INT_MAX) or whatever largest value as possible
. Then
. From the starting point (current node):
. Examine each connected vertex and mark the current lowest value of the distance from the current starting point to each vertex
. Store, both the lowest value and from which vertex we got that lowest value:

. Keep track of the set of vertexes that are on the fringe, a priority heap would be wonderful data structure in which to store this.
. Then remove the next closests vertex and make that our current ‘starting’ point
. When you do this, you have found the shortest path to this vertex and you can mark it as complete

. Then continue with the same algorithm, looking at each connected vertex and calculating the lowest value to get there, and from which vertex

. Once we have computed the shortest distance to all of the nodes (including the destination), we create a stack and then just follow the nodes backwards, visiting the via Node for each and push it’s id onto the stack. In the end we have a path to the destination that is the shortest.

Spanning Trees

A subgraph of a undirected graph that is comprised of all of the vertices with the least number of edges. It MUST

  • Include all of the vertexes of the graph or it is not a spanning tree
  • Not include any cycles or loops
  • Not be disconnected
  • Contain n-1 edges where n is the number of vertices