Tuesday, February 28, 2012

The Diameter of a Binary Tree

Problem: Given a binary tree, there is a path connecting any two nodes u,v in the tree. Among all the possible paths, the longest path is considered as the diameter of the binary tree. Find out this path.

Solution: We can use recursion to solve this problem. We need to be careful than this longest path can just reside in one of the sub-tree of the root. The code for this algorithm is as follow:

int height(TreeNode * root)
{
   if(!root) return 0;
   
   return 1+max(height(root->left), height(root->right));


int Diameter(TreeNode *root)
{
   if(!root) return 0;

   int h_left = height(root->left);
   int h_right = height(root->right);
   
   return max(1+h_left+h_right, max(Diameter(root->left), Diameter(root->right)));

}

Memorization might be used to avoid repeated calculation.

Monday, February 27, 2012

An O(N^2) Algorithm for Optimal BST

Problem: Given a set of ordered keys and their corresponding probabilities to be visited, construct an optimal BST tree with minimum search cost.

Solution: This is the classical optimal BST problem. The well-known solution has an O(N^3) complexity. However, Knuth found a better way that can achieve O(N^2). Basically, when we try to find the root of an optimal BST, previously we need to iterate from i to j. But there is an observation: root[i, j-1] <= root[i,j] <= root[i+1, j]. Given this inequality, we can reduce the range which we look for the root. This new range (root[ij-1]], root[i+1, j]) is smaller than (i, j). The detail solution and proof can be found here.

Sunday, February 26, 2012

Construct All Possible Binary Trees with N Nodes

Problem: Enumerate all possible binary trees with n nodes. For example, if n=1, there just one tree; if n=2, there are two trees:

         *             or            *
           \                          /
            *                      *

Solution: The key is to understand what can uniquely denote a tree (or serialize a tree). A good way to serialize a tree is to record the pre-order traversal of a binary tree plus the nil virtual leaf information. If we denote an actual node as "1", the nil virtual leaf as "0". The sequence "10100" will stand for the following tree:


                  *          
               /      \                    
             nil       *
                      /    \
                   nil    nil

The way to decide the nil leaf is to check the children of a node. If a node is a leaf in the tree, it will have two nil virtual leaves, since a leaf won't have any children.  Similarly, we can know the sequence "11000" will stand for the following tree:


                  *          
                 /  \                
               *    nil
              /  \
           nil  nil

Therefore,  our target is to enumerate all possible sequences. There are several constraints on the sequences:

  • For a n node tree, the length of the sequence will be 2n+1 with n "1" and n+1 "0".
  • For any position i in the sequence s and != 2n,  the number of "1" should always be no smaller than the number of "0" in the sub-sequence s[0, i].
The related code is as follow:
// rebuild the tree from a sequence such as "11000"
TreeNode * rebuild_tree(int s[], int n)
{

    TreeNode *root = create_node();
    stack<TreeNode *> stk;
    stk.push(root);
    
    for(int i=1; i<s.size(); i++)
    {
       if(s[i])
       {
         
         TreeNode *node = create_node();
         if(s[i-1]) 
         {
           stk.top()->left = node;           
         }
         else
         {
           stk.top()->right = node;
           stk.pop();          
         }
         stk.push(node);
       }
       else 
       {
         if(!s[i-1]) stk.pop();
       }
    
    }
    
    return root;

}

//print all possible trees
void output_all_possible_trees(int *seq, int n, int num1, int num0)
{
     
     if((num1 + num0) == 2*n)
     {
        seq[2*n] = 0;
        TreeNode *root = rebuild_tree(seq, 2*n+1);
        print_tree(root);
        return;
     }
        
    if(num1 >= num0 && num1 < n)
    {
        seq[num1+num0] = 1;
        output_all_possible_trees(seq, n, num1+1, num0); 
    }       
    
    if(num0 < num1 && num1 <=n)
    {
        seq[num1+num0] = 0;
        output_all_possible_trees(seq, n, num1, num0+1);  
    
    }
   
}

Iterative Binary Tree Traversal with Constant Space Overhead

Problem: Given a binary tree, try to traverse the tree in an iterative way. The space overhead is expected to be constant.

Solution: With stack, we can iteratively traverse the tree. Here explain a way to traverse the tree without stack. The main idea is to use threaded tree. The key is to leverage the right child pointer of a node that is null. Then set this null-valued right child pointer to the in-order successor of the node. The detail of the algorithm is as follow:

  • Inspect a node n, if the node doesn't have a left child, visit this node, then proceed to its right child.
  • If the node has left child, there are two situations. The first is that the predecessor of n has not been processed. Then we need to find a right_most node that will be the predecessor of set right_most's right child to be n. If n's left child has a right child, right_most is left child's right-most child; if n's left child doesn't have a right child, right_most is left child itself. Then we proceed to  n's left child.
  • The second situation is that the predecessor of n has been processed. We can tell this by keep following n's left child's right child pointer until a right child pointer points to n or the pointer is null. If the pointer points to n, it means the predecessor of has been processed, then we visit n, proceed to its right child. We can do one extra thing here, before moving forward, we can set the right child of n's predecessor to be null, since previously we modified its value to thread the tree. By doing this, we can keep the tree unmodified after traversal.
  • The algorithm stops when we find the current node is null
The code is below:
void Non_recursive_tree_traversal(TreeNode *root)
{
   if(!root) return;
   
   TreeNode *cur = root;
     
   while(cur)
   {
      // if no left child, visit itself      
      if(!cur->left)
      {
        visit(cur);
        cur = cur->right;        
      }
      // left child is there
      else 
      {
        TreeNode *pp = cur->left;
        
        while(pp->right)
        {
           if(pp->right == cur) break;      
           pp = pp->right;     
        }
        
        // if cur's predecessor has been visited 
        if(pp->right == cur)
        {
           visit(cur);
           // revert the right pointer to null
           pp->right = NULL;
           cur = cur->right;
        }        
        else if(!pp->right)
        {
           // if the right child pointer is null
           // set its value to thread the tree
           pp->right = cur;
           cur = cur->left; 
        }
 
      }     
   }
}

Saturday, February 25, 2012

Get the Max and Min Simultaneously

Problem: Given an array of integers, try to get the max and min in this array with the least number of comparisons.

Solutions:  if we have a current_max and current_min, for each element except the first one, we need to compare it against current_max and current_min. Thus in total, we need 2n-2 comparisons (or 2n-3 if you do it slightly different). However, a better approach exists by inspecting the integers in pair. We first compare the two adjacent integers, and use the bigger one from the two to compare against current_max and use the smaller one from the two to compare against current_min. For example, if A[i] = 4, A[i+1] = 10 and current_max = 11 and current_min = 2,  we just need to compare A[i] against current_min and A[i] against current_max. Therefore, for every two integers, we need 3 comparisons, which leads to a total comparisons around 3n/2.

Something about Heapsort

Heapsort has two pleasant properties

  1. an average and worst-case O(N*LogN) complexity
  2. in place algorithm, no extra space is needed.

Heapsort has two steps:

  1. first make a max heap (or min heap), this operation takes O(N)
  2. pop the root of the heap, swap it with the last element in the heap (the last leaf), do heapify on the elements rangeing from 1 to N-1. Basically, we build a new max (min) heap on the elements rangeing from 1 to N-1
  3. repeat the step 2

Strassen’s Method

Strassen’s method is a good  algorithm used in matrix multiplication. Naive approach usually takes O(N^3) to do matrix multiplication, while Strassen’s method costs O(N^2.8). Basically its recursion formula is like T(N) = 7*T(N/2) + N^2.

Friday, February 24, 2012

Get the Maximum Profit from Stock -- Maximum Subarray Problem

Problem: Given an array price[], price[i] denotes the stock price at day i. Assume the array has a length of N (from day 1 to day N), try to get the maximum profit by selecting optimal timing to bug and sell stocks. For example, if price[] = {5, 2, 4, 3, 1},  the maximum profit per share we can get is $2. What we need to do is buy at day 2 and sell at day 3.

Solution: Naive approach takes O(N^2) by checking any two days. An improved approach costs O(NlogN) by first finding a descending sequence from the beginning. However, optimal approach should only cost O(N). The key is to do some transformation on the original problem.  Let us still reuse the previous example, given price[] = {5, 2, 4, 3, 1}, we can get another array diff[] = {-3, 2, -1, -2} such that diff[i] = price[i] - price[i-1] .  Then other goal is just to find the maximum subarray in diff[].  As known, there is a O(N) solution for the  maximum subarray problem.

More: An alternative that may be conceptually even simpler exist. Just do a linear scan, we keep a current_min, then for every day, we calculate the gain as the difference between the price at that day and current_min,if the gain is more than the maximum gain that we are tracking, update the maximum.
This algorithm also takes O(N).

Friday, February 10, 2012

N Disks and K Pegs -- General Problem of Hanoi Tower.

Problem:  Most people must be aware of the Tower of Hanoi problem that has N disks and 3 pegs. Basic Hanoi problem can be solved either by iterative or recursive approach. However, here we discuss a more general problem. Given N disks and K pegs, also given a start state (e.g., the peg that a particular disc is on) and an end state, try to find the optimal solution (i.e., minimum moves) to move the disks such that we can go from the start state to the end state. Here we still need to follow the similar rule in Hanoi: larger disk should be below smaller disk.

For example, assume we have 4 pegs, and the start state is: pegs[1] = [3,2] (which means disk 3 and disk 2 are on peg 1), pegs[2] = 4, pegs[3] = 1, pegs[4] = []; the end state is pegs[1] = [1], pegs[2] = [3], pegs[3] = [],  pegs[4] = [4,2]. The optimal solution is 2->4, 1-> 4, 1->2, 3->1 ( here "2->4" means move the top of peg 2 to peg 4), totally 4 moves.

Solution: if the N and K are not too large, we can use DFS to solve this problem. Basically, the total space will be K^N, but we are able to prune out some unfeasible paths.

  • Basically, we try all the possible moves in a DFS way. If we reach the end state, we stop and return. Meanwhile, we need to remember how many moves it takes to the end and store it in current_min.
  • If from the start state, we have already made current_min moves, we don't need to go further, since we can't get a result better than  current_min.
  • We need to have a hash table to remember the state we have seen before. If we found we revisited a state, try to compare the moves that we have made now and the previous number of moves to reach that state. If now we take less move to revisit the state, we go forward. Otherwise, we can just return.
The pseudo code is as follow:

// the counter for the number of moves from the 
// start state 
int cnt = 0
// the current minimum moves to the end state
int current_min = INT_MAX
// a hash table to remember the visited states
hash_table ht;

void dfs()
{
  if(cnt >= current_min) return;
  // get the current state of disks   
  string stat = get_state()
 
  // if stat is not in hashtable
  if(!ht.has_key(stat))
  {
    ht[stat] = cnt;  
  }
  else
  {
     // if we take more moves to 
     // visit this state, we just
     // return
     if(ht[stat] <= cnt) return;
     ht[stat] = cnt;
  }   

  // if we reach the end state
  if(stat == end_state)
  {
     if(current_min > cnt)
     current_min = cnt;
     return;
  }
  
  for(int i=0; i<K; i++)
  {
     current_disk = peg[i].top();
     // iterate all the possible moves
     // current_disk can make
     for move in possible_move_set
     {
        make the move;
        cnt++;
        dfs();
        rewind the move;
        cnt--;
     }
  }

}

Wednesday, February 8, 2012

String Permutation without Duplication

Problem: This is a classic problem. Given a string "abc",  you need to output all the permutations. They are "abc", "acb", "bac", "bca", "cab" and "cba".  While for input that contains duplicate characters, you need to output all unique permutations. For example, input is "abb", then output should be "abb", "bab" and "bba".

Solution: For strings without duplicate characters, the solution is not difficult. To avoid duplication, definitely you can use a hashtable to memorize the permutations that you had outputted.  Considering that hashtable may cost a lot of space, some other solutions are desired. However, many solutions on the web actually are incorrect. Here gives a correct solution by using a bit vector (or counting array).

  • The key thing is that if now I am gonna swap a character c to position i, I must guarantee previously there is no c that has been swapped to  position i. This is the important constraint to avoid duplicate.
  • Then we can use an array to track this. If the string only contains characters that range from 'a' to 'z', then we only need an array of the length 26. The code is as follow:

void permutation(string &s, int idx)
{

     if(idx == s.size())
    {
        cout << s << endl;
        return;
    }

    char a[26] = {0};

    for(int i = idx; i < s.size(); i++)
    {
        
        if(a[s[i] - 'a'] == 1) continue;

        a[s[i] - 'a'] = 1;

        swap(s[idx], s[i]);
        permutation(s, idx+1);
        swap(s[idx], s[i]);
   }
}

Monday, February 6, 2012

Efficient Way to Calculate Fibonacci Number

Problem: To give a O(logN) solution to Fibonacci function.

Solution: Solving Fibonacci with recursive function is easy. It is also not difficult to have an iterative solution with O(N) time complexity. Then can we do it even better? A solution based on matrix multiplication can help us achieve that.

  • Basically we have a 2*2 matrix A = {[1,1], [1,0]} ({[first row], [second row]}). we can see that a[0][0] =1 = fibo(2). Then A*A = {[2,1],[1,1]} and a[0][0] = fibo(3) = 2. Basically we will have A^n = {[fibo(n+1), fibo(n)], [fibo(n), fibo(n-1)]}. 
  • Then calculating Fibonacci number basically boils down to calculate the power of A. You may think this still takes O(N). However, we can calculate power efficiently (see this link) such that only O(logN) is needed.
More to mention: Actually there is a formula that generates Fibonacci numbers based on the golden ratio. However it evolves floating operations which may not as efficient as the matrix solution.

Find the Maximums in a Sliding Window

Problem: Given an array of integers and a window that has a size w, slide the window from the left side of the array to the right side, report the maximums in the sliding window during the sliding. For example, a[] = {3,1,5,2,1,4,-3, -2 , 6} and the w = 3. Then at the very beginning, we have [3,1,5] in the sliding window and then the maximum in the window is 5. If we slide the window one step towards the right, we have [1,5,2] in the sliding window and the maximum in the window is still 5. You need to get all these maximums. For this example, the maximum array is {5,5,5,4,4,4,6}.

Solution:  It is easy to know naive approaches will take O(wN), since after each sliding operation, we need O(w) time to find the maximum. To improve, you can use skip list. Basically, you have a sorted linked list to represent the integers in the window. Then you build skip list on top of it. For every slide operation, we need to do an insertion and a deletion. Therefore, the complexity will be O(Nlogw). However, a smarter solution from www.leetcode.com can achieve O(N). The details are as follow:

  • Have a double-ended queue (which you can both manipulate the head and the tail. For initialization, add the indices of the first integers to the queue. While adding indices, we must guarantee the integers to which those indices in the queue point keep a descending order. Therefore, if the index to be added points to an integer that is bigger than some integers to which some indices already in the queue point, we need to remove all "those" indices first.   For the example we showed previously, a[] = {3,1,5,2,1,4,-3, -2 , 6},  after adding the first integer, the state of the queue will be [0], which means a[0] is in the queue; After adding the second integer,   the state of the queue will be [0,1], which means a[0], a[1] are in the queue. Since a[0] > a[1], this guarantee a valid descending order. However, when adding the third integer, we need to pop put both "0" and "1", since a[2] is bigger than either a[0] or a[1]. Then after adding the third integer,   the state of the queue will be [2], which means only a[2] is in the queue.
  • Since it is a sliding window, we also need to retire indices that point to the integers which are out of the window. We can do this by checking the indices in the queue and the current location of the window. If some index is out of bound, we need to remove them.
  • The size of the queue will always not be bigger than w. For every integer in a[], it will be inserted in the queue and deleted from the queue for at most once. Therefore, the complexity is O(N). 
 You can find the code for this algorithm in  www.leetcode.com.

Thursday, February 2, 2012

Find the Unique Element that Appears k Times in an Array

Problem: Let's first come with some special cases of this problem. Then we can generalize. Given an integer array, only one integer appears 2 times, all the other integers appears 3 times in the array. Find out that integer. For example, for a[] = [2,3,2,4,4,2,3,4], the output should be "3".

Solution: Definitely with hash table or sorting, the problem can be easily solved. What if those approaches are not allowed? Fortunately, we can still use bit operation to solve this problem.

  • We need some counters to store the bits that just appear once, the bits that appear twice.
  • If some bit appear three times, we need to reset those counters.
  • Then at the end, the counter that stores the bits that appear twice will be the answer
Sample code is as follow:
def foo(list):
        # three counters which are used to store
        # the bits that appear 1 time, 2 times and 
        # three times, respectively
        ones = 0
        twos = 0
        threes = 0

        for x in list:
                # which bits had appeared for 3 times?
                # and later we need to reset those
                # bits if they also appear in the 
                # counter "ones" and "twos".
                threes = twos & x
                twos ^= (ones & x)
                twos &= ~threes
                ones ^= x
                ones &= (~threes & ~twos)

        # here we just need to return the counter 
        # "twos". If we need the unique element
        # that appears once (assuming the rest
        # all appear 3 times, just return "ones"
        # instead. 
        return twos


A more general problem: What if we change the problem. Now the problem is given an integer array, only one integer appears k times, all the other integers appears m times in the array (k<m). Find out that integer.

Solution: We can use similar strategy (as shown in above code snippet) to solve the problem.

  • Basically, we need m counters (or a counter array). 
  • We have counters[m] = counter[m-1] & x.
  • counter[i] ^=  counter[i-1] & x and  counter[i] &= (~ counters[m] & ~counters[m-1] & ... &  ~counters[i+1] )