Algorithms

At the highest level, algorithms are either deterministic or probabilistic.

An example of a deterministic algorithm is comparing files.

  1. Check the size.
  2. If that is the same, then check them each byte-by-byte until you find a different byte or reach the end of each file.
  3. If nothing is different they are the same.

Examples of probabilistic algorithms are bloom filters or a hashing algorithms.

BigO Notation Overview

In general, algorithms usually fall into the following performance classes: constant-time, logarithmic, linear, polynomial, exponential, and factorial.

  • Constant-time O(1): Random access in an array, insert in a linked list, lookup at a memory location. Simple comparisons are constant O(1).
  • Logarithmic O(log n): binary tree search. The log is typically 2. A binary search of sorted numerical data or a balanced tree, O(log n). The amount of time to execute the algorithm is the log of the input size.
  • Linear O(n): Finding an item in a linked list, searching an unordered list, basic operations on n members O(n) where you have to operate on every element in an input set.
  • Polynomial O(n^k): Where k is a constant and n changes. This is a feasible computation. Convolutions, Discrete Fourier transform, many types of sorting algorithms are O(n^2).
  • Exponential O(k^n): Where k is a constant and n increases, like binary numbers 2^2, 2^3, 2^4, 2^5, etc.. This quickly becomes unsolvable.
  • Factoral O(n!): Traveling salesman, dictionary attacking, password cracking. For example, you have 5 items and want to compare all of the permutations: !5 = 5 * 4 * 3 * 2 * 1 = 120. Brute force permutations are factorial, O(n!).
  • Binomial Coefficient O(“n choose k”): These sorts of time complexities arise out of problems where you are asked to choose the distinct combinations of items (k) from n number of input elements. It is also known as the combinatorial number. This is one of those algorithms that you just have to know. At least for me, it was not one that was obvious when reasoning it out on a whiteboard.

Links and Further Reading

Online Resources for Learning and Practicing Algorithms

Concepts Related to Algorithms and Searches

Tail Recursion

Tail recursion is coding the call to an recursive function so that the return value of the recursive call is immediately returned to the calling code. This enables for optimization during the compilation of the code that can avoid pushing an extra function call on the stack and can help to eliminate stackoverflow problems.

The following is tail recursive as the final call in the function is to return the value of the recursive call:

int f(int x, int y) {
  if (y == 0) {
    return x;
  }

  return f(x*y, y-1);
}

The following is NOT tail recursive as there is an operation performed by the result of the recursive call before returning a value to the calling frame.

int g(int x) {
  if (x == 1) {
    return 1;
  }

  int y = g(x-1);

  return x*y;
}

Tractability

So-called easy, or tractable, problems can be solved by computer algorithms that run in polynomial time; i.e., for a problem of size n, the time or number of steps needed to find the solution is a polynomial function of n. Algorithms for solving hard, or intractable, problems typically require time complexities that are exponential.

TODO: Clean up these tips on how to go about working through a coding problem with an interviewer (might move this to another page too)

Questions to ask (re-word the following)

Consider asking the following questions.

How big is the size of the input?
How big is the range of values?
What kind of values are there? Are there negative numbers? Floating points? Will there be empty inputs?
Are there duplicates within the input?
What are some extreme cases of the input?
How is the input stored? If you are given a dictionary of words, is it a list of strings or a trie?

It is also common that the interviewer asks you extension questions, such as how you would handle the problem if the whole input is too large to fit into memory, or if the input arrives as a stream. This is a common follow-up question at Google, where they care a lot about scale.

The answer is usually a divide-and-conquer approach — perform distributed processing of the data and only read certain chunks of the input from disk into memory, write the output back to disk and combine them later.

Steps:

  1. Understand the question. When you are given the question repeat it back to the interviewer and then ask clarifying questions. Some are specific to the problem that you are asked, others are common:
    1. How large is the input data set?
    2. What is the range of the input values?
    3. What are the types of input values? Are they all the same data types? If they are numbers, can they be negative, integral types, floating point numbers, empty or null values?
    4. Can you expect duplicates in the data sets?
    5. Can you assume valid input data?
    6. What are some edge cases that you should expect?
    7. How is the input data stored, and/or what data structures will be used to convey the input data?
  2. Discuss your high-level approach. I think out loud and draw on the whiteboard at the same time.
    1. Talk about the basic data structures that you will use, and draw them out on your whiteboard.
    2. You can start with a brute-force approach and then refine it as you work through it. If you draw it out, as described below it will be easy to see duplicate sub-problems and enable you to memoize as you go.
    3. Discuss the time and space complexities of the approach
    4. Discuss how the approach may work and what might be the downsides of your first, possible brute-force approach
  3. Design it Completely. Design your algorithm completely on the whiteboard. Write out the data structures you will use, any helper functions and write out the code on the whiteboard before writing a single line of code in a text editor. With some practice, you will find that it is many times faster to write the code on the whiteboard, not necessarily in completely compileable code in the language of your choice but in a short-hand that you can understand and explain, and refactor it than it is to start writing code and then have to refactor it as you walk though it.
    1. Draw out a sample input set.
    2. Write your entry-point function along with it’s arguments.
    3. Walk through, on the whiteboard the processing of a given data set, talking through and drawing the steps for each line of code.
    4. Do this until you are certain that you have designed the entire algorithm and written most of the code/pseudo code and that all you have to do next is write it out in an editor, compile it and run it.
    5. Think about what you will do if you have to stream the input and it is too large to fit in ram, this will almost always be the case.

http://www.programcreek.com/2012/11/top-10-algorithms-for-coding-interview/