Data Structures

Algorithms and data structures tend to go together.

Links

The basic ontology of data structures with just a very few examples of each type. A more complete list can be found here.

Linear Data Structures

Arrays

Array: One of the most basic of data structures that exploits the addressing logic of linear, one dimensional, memory systems. A structure whereby the elements stored it in are adjacent to each other in memory and allow for O(1) random access.

Circular buffer

Matrix

Lists

Linked lists: Each node in the list linked to the next in the list. O(1) insertion, but traversal is O(n). Singly linked lists only links in one direction from head to tail. Doubly linked has larger resource overhead for storage of forward and backward pointers but makes removal and insertion O(1). Benefits over array are better insertion and removal but slower random access. Also no need to reallocate the entire linked list when new items are added because each data does not need to be adjacent to the rest in memory.

Skip lists

Trees

  • M-way Trees: M, being the degree of the tree. A binary search tree has a degree of 2. M-way trees can various degrees or number subtrees
    • Binary trees: elements have at most 2 children
      • Binary search tree
      • Red-black tree
    • B-trees: elements can have more than 2 children
      • B-tree
      • B+ tree: Similar to B-tree but all leaf nodes are linked together in a doubly-linked list.
  • Heaps
    • Heap
    • Binary heap
    • Weak heap
  • Trees
    • Tree
    • Radix tree
    • Suffix tree
  • Multi-way trees
    • Ternary tree
    • K-ary tree
  • Space-partitioning trees
    • Segment tree
    • Range tree
  • Hash Based
    • Hash table: Keys associated with values. The keys are hashed to provide for efficient lookup of the value to which the key is associated. Typically implemented with an array and a hash function (along with a collision algorithm to sort out instances where duplicate keys map to the same array position).
    • Bloom filter
  • Graphs: Collections of vertexes and edges.
    • Graph
    • Adjacency matrix
    • DAG (Directed Acyclic Graph)
    • Directed ternary: each child has at most three children
  • Abstract data types:
    • Set: A (typically) unordered set of elements that only retains one of each value.
    • Stack: Collection of items where only the top most (most recently added item) is available. Typically last in, first out, depending on the insertion algorithm.
    • Queue: Collection of items where only the oldest item is available, new items added to the back, oldest from the front (FIFO or LILO).
    • String: In many high-level programming languages a String is a data structure that abstracts the creation, access, and mutation of an array of characters.

Common Data Structures

B-Tree

Usually a shallow but very wide structure. Similar to binary search tree except that a node can have more than two children. Better suited for r/w large blocks of data, think a filesystem or database. The leaves of the tree are where the data is stored. Intermediary nodes only store keys that are used as separators to partition the data. In a 2-3 b-tree, the parent node stores two values which separate the three child leaves. Each leaf stores multiple values which speeds up access because of the locality of the data on disk or in RAM.

          |   key=10   |    key=20    |
         /             |               \
        /              |                \
val=1, val=2     val=12, val=14     val=22, val=30

B-trees are of some “order N” meaning that nodes contain upto N children.

  • Searching begins at the root node and follows the pointers from node to node until the key is found, or the search fails by ending at a leaf node without the ability to traverse any further.
  • Root node initially begins with 1 key. It is a special exception and the only node allowed to have less than n keys in an order n B-tree.
  • To insert the key, we first search for it. If we find it, our work is done. If we don’t find it, we end up at the place where the key should be stored.
  • If keys are full, then the node is split and the middle value is promoted up the tree. If the promotiion goes all the way to the root and causes a split, the middle value is promoted up to a new root node.
  • To delete we must move the value to a leaf node first

Trie

Keys are typically Strings. All descendants of a node have a common prefix. Think an autocomplete for a form field.
Root is associated with an empty string. Values tend to be associated with leaves in the tree

Heaps

A tree based structure that satisfies the heap property. If B is a child of parent A, then A’s key must be >= B’s key. The element of the greatest value is always the root node. This is for a max-heap. The converse is a min-heap. This is the primary underlying data structure for a priority queue.

In a binary heap

  • There are at most two children per parent node.
  • There is at most 1 node with just 1 child
  • That child is the LEFT child of its parent and
  • It is the right-most leaf at depth d.

Arrays can be used as the underlying data structure with a fairly simple algorithm

Given the following min, binary heap:

          2
        /   \
       /     \
      5       6
     / \     /
   10  11   9

We can store each of the nodes in the following array

Array Index0123456
Valueempty25610119

The root value is always inserted in a[1] such that the algorithm to find the children and parents works properly.

For any child in index k the following is true:

  • Left child = a[2*k]
  • Right child = a[2*k+1]
  • Parent = a[k/2]

For 5 in a[2], it’s left child = a[2*k] or 4, right child = a[2*2+1], or 5.

For 11 in a[5] it’s parent = a[5/2] or 2, which is 5

To insert values into the heap, always insert at the end of the array. Then check to see if the order property of the heap is still maintained. Check to see if the inserted item’s parent is of lesser value, if not swap the two values and continue checking up to the root of the tree.

After removing the root (in this case the min value), replace the value in the root with the value at the end of the array, removing that leaf from the tree. Then work the value down the tree by checking for the smaller of the two child values and swap the position with that value. This will ensure that the SMALLEST value becomes the root. Continue until neither of the children of the value being sifted down is smaller than it.

The reverse is true for max heaps.

  • O (log n) for inserts and removals
  • O (n) to build initially

XOR Linked List

An XOR Linked List is a more memory efficient doubly-linked list which uses an XOR operation on data to derive the pointer location of the next and prev nodes.

  • http://www.geeksforgeeks.org/xor-linked-list-a-memory-efficient-doubly-linked-list-set-1/
  • https://en.wikipedia.org/wiki/XOR_linked_list
  • https://www.techiedelight.com/xor-linked-list-overview-implementation-c-cpp/

Skip List

Involves a list built in layers allowing you to “skip” to the node that is closer to the element for which we are searching. The lowest layer is the complete linked list. The top most is the layer in which each of the elements is a “key” and where each subsequent key is greater than or less than the previous. In this way, we can iterate through the sparsest set of links until we find the element with the starting value that we are looking for and then search downward in the layered lists for the specific element. Removes the performance O(n) bottleneck of having to iterate over the entire list. Insertion and deletion operations remain relatively efficient when compared to the standard Linked List sibling.

Average search is O(log n), worst is O(n) which is similar to linked list, but this outcome is extremely unlikely.

https://en.wikipedia.org/wiki/Skip_list