Saturday, July 30, 2011

Something about Interrupt, Mutex and Semaphore

The details of interrupt and mutex are not that simple. Be aware of the following bullets:

  1. Disable/Enable interrupt can be used to implement mutex. It is simple but flawed.
  2. For multi-processors, disable interrupt may not be useful, unless you disable interrupt on all processors, which will be prohibitively expensive.
  3. Atomic instruction, such as test-and-set, can be used to implement mutex. test-and-set is an atomic instruction which writes 1 to a memory location and fetch the old value of that location. For example, a spin-lock implementation is given as follow:
  4. while (test_and_set(lock)==1);
  5. Besides,  other atomic instructions such as exchange, compare&swap, load linked and conditional store can also be used.
  6. For the example given in 3), we will busy wait. To minimize the waiting time, we can do the following (add a guard variable):
  7. while (test_and_set(guard)==1);
    
    if(lock_value == 1)
    {
       put the thread in the waiting queue for the lock;
       go to sleep;
       guard = 0;
    }
    else
    { 
       lock_value = 1;
       guard = 0
    }
  8. Spin lock can sometimes delay releasing the lock. For example, thread A get switched out right after grabbing the lock. Thread B kicks in and try to grab the lock, but the lock has already been grabbed. Thread B has to be busy waiting, which will delay the switching back to thread A.
  9. Semaphore represents the number of resources that are still available to simultaneous users (e.g. there are still 4 available slots)..   
  10. Besides, we can also use implement some lock-free data structure in pursuit of better performance. These structures usually leverage some atomic instructions or atomic variables (the read/write to the variable is atomic, e.g. pointer type.). Some data structures are weak enough to be implemented without special atomic primitives, e.g., FIFO queue.

Friday, July 29, 2011

Things to Remember about memcpy()

If you want to implement memcpy(), you need to pay attention to the following things:
  1. if you can use wider data type, use it (e.g., copy a 32 bit instead of 8 bit).
  2. leverage the instruction set provided by the underlying architecture (processor). For example, some architecture has *p++, some only has *(++p).
  3. memory alignment. Sometimes if you want to do 32bit copy, the instruction may require memory address to be aligned. Therefore some extra work needs to be done.
  4. use pointer such as const char * for src.
  5. if the src and dest memory address overlap with each other, some extra work needs to be done.

Some Tricks about Optimizing Branches

You should be cautious when using branches in your code. Sometimes the performance could be hurt.
while (++i > count)
The above code may generate complex machine code. It is much better to let the compiler to generate code that utilizes the processors compare instructions in an efficient way. This means creating loops that terminates when a compare value is zero, which is as follow:
while (count--)
The following code is also not good:
while (count-=2)

Sunday, July 24, 2011

Find the Least Common Ancestor of Two Tree Nodes

Problem: Given two nodes in a tree, find the least common ancestor of them.

Solution: This is not a difficult problem. If there is parent pointer, things will be easier. However, simple solution still exist for tree nodes without parent pointer. See the code below:

node * common_ancestor(node *root, node *n1, node *n2)
{
   if(root==null || n1 == null || n2 == null)
     return null;

    if(n1 == root || n2 == root) return root;
   
   node * left = common_ancestor(root->left, n1, n2);
   node * right = common_ancestor(root->right, n1, n2);
   
   if(left && right) return root;
   
   return left ? left : right; 

}

More: if we need to do many common_ancestor() operations, there is a more efficient off-line algorithm called "Tarjan's off-line least common ancestors algorithm". It can get the least common ancestors for m pairs of nodes in just one tree traversal. The main idea is to leverage disjointed forest  with path compression and union by rank.

Friday, July 22, 2011

General Monty Hall Problem and Information Theory

Problem: There are three doors and only there is one door behind which there is a prize.You first select a door, then the host of the game will open one door behind which there is no prize (he can't open the door you had selected). Then you are given the choice to switch the door or not. This is the typical Monty Hall problem people talk about. The general form is that n doors with only one door behind which there is a prize. The host of the game will open m doors (m<n-1) behind which there are no prize. Then you are given the choice to switch to another door.

Solution: The most most important observation about this problem is that "probability" is actually not about "randomness", but about information. How much information we have will influence the probability".  Now let's first look at the basic form of the Monty Hall problem.

  • assume you chose door A, the probability that you had made the correct choice, P(A),  is 1/3. Assume the host opened the door C (the case for door B is the same), let us denote this action as O. The probability, P(A|O), which means the probability that you had made the correct choice given the host opened the door C, is what we are interested in. Actually P(A|O) = P(A) = 1/3, since unless you move the prize, there is no way you can change the odds of your original choice.
  • Besides, we can systematically calculate P(A|O). According to Bayes's theorem, P(A|O) = P(O|A)*P(A)/P(O). We know P(A) = 1/3. P(O|A) should equal 1/2, since if A is the right answer, B and C are both empty doors. The the probability for the host to open door C is simply 1/2. Then we need to calculate P(O). 
  • P(O) = P(O|A)*P(A) + P(O|B)*P(B) + P(O|C)*P(C). It is easy to know, P(O|C) = 0, also  P(O|A)*P(A) =1/6. We have P(O|B) = 1. The reason is that we had already chosen A, but B is the right answer, so the only choice for the host is C. Therefore, P(O) = 1/6+1*1/3 = 1/2. Then we have P(A|O) = 1/3.
  • Then the probability to win if we switch is P(B|O) = P(O|B)*P(B)/P(O) = 1*1/3 / (1/2) =2/3, so we need to switch!
Now let's try to tackle the general problem.

  • The probability to win if we stick to the original choice (assume it is A again), P(A) = 1/n, also we have P(A|O) = 1/n.
  •  The probability to win if we switch, P(S|O) = P(We switch to the correct door | Our original choice is wrong) = (1/(n-m-1))*(1-1/n) = ((n-1) / (n-m-1))*1/n). It is easy to know P(S|O)  > P(A|O), so we still need to switch.
  • Thinking in the way of information theory, after host reveal some empty doors, we are given more information. These information will change the probability distribution!
See Also: Monty hall and Bayesian probability theory,  The Monty Hall problem -- over easy

The Mystery about sizeof()

sizeof() is to get the size of a class or an class object. It might be simple conceptually, but there are some details you may not know.

  • what does sizeof() return?
class A
{
  static char x;
  int y;
};

int main()
{
  int s1 = sizeof(A);
}
        s1 = 4. Why s1 = 4? The size of static members will not be counted into the size of the class. The reason is that those static members are stored centrally and shared by all the instances of the class.
  • what does sizeof() return?
class A
{
  char x;
  int y;
};

int main()
{
  A a;
  int s1 = sizeof(A);
  int s2 = sizeof(a);
}
        s1 = 8 and s2 = 8. Why s1 = 8? The size it really needs is just 5 bytes. The reason is for alignment so padding is added. Why s2 = 8? sizeof(a) is equivalent to sizeof(A). Moreover, the padding scheme will sometimes make the order of data members matter. For example, if we have "char a; int x; char b;", the size will be 12. However, if we have "char a; char b; int x;", the size will be 8.
  • what does sizeof() return?
class A
{
  char x;
  int y;
  virtual void bar();
};

int main()
{
  int s1 = sizeof(A);
}
        s1 = 12. Why s1 = 12? Since virtual function is defined, 4 extra bytes need to be allocated for the pointer to virtual function table.

  • what does sizeof() return?
class A
{
  char x;
  int y;
  void bar();
};

int main()
{
  int s1 = sizeof(A);
}
        s1 = 8. Why s1 = 8? Since there is no virtual function defined, we don't need the 4 extra bytes for the pointer to virtual function table. For non-virtual function, there is a central place to find those functions, therefore we don't need such pointer.
  • what does sizeof() return?
class A
{
  char x;
  int y;
  void bar();
};

class B{
   int a;
   A aa;
   virtual void somefunction() ;
}

int main()
{
  int s1 = sizeof(B);
}
        s1 = 16. Why s1 = 16? The size of class A is 8. class B has one integer, one class A instance plus the pointer to the virtual function table. So the total size is 8+4+4 = 16 bytes.
  • what does sizeof() return?
int foo(int n)
{
  char b[n+3];
  return sizeof(b);
}

int main()
{
  int s1 = foo(8);
  
}
        s1 = 11. Why s1 = 11? In this case, at compile time, the compiler can't know the size of array b. The size is known at the run time. That is to say, sizeof() can be evaluated at run time for some case. But for the general case, it is evaluated at compile time.
  • what does sizeof() return?
class ABase{ 
        int iMem; 
}; 

class BBase : public virtual ABase { 
        int iMem; 
}; 

class CBase : public virtual ABase { 
        int iMem; 
}; 

class ABCDerived : public BBase, public CBase { 
        int iMem; 
}; 

int main()
{
   int s1 = sizeof(ABase);
   int s1 = sizeof(BBase);
   int s2 = sizeof(CBase);
   int s4 = sizeof(ABCDerived);
}
        s1 = 4, s2 = 12, s3 = 12 and s4 = 24. Why ? In this case, because BBase and CBase are derived from ABase virtually, they will also have an virtual base pointer (different from the pointer to virtual function table). So, 4 bytes will be added to the size of the class (BBase and CBase). That is sizeof ABase + size of int + sizeof Virtual Base pointer.Size of ABCDerived will be 24 (not 28 = sizeof (BBase + CBase + int member)) because it will maintain only one Virtual Base pointer.
  • what does sizeof() return? (edited on Aug 12)
void foo(int a[])
{
  cout << sizeof(a) << endl;
}

int main()
{  
   int a[10];
   foo(a);
   cout << sizeof(a) << endl;
}
    .    The sizeof() in foo() will return 4 while the one in main() will return 40. The reason is that the a in foo() is actually interpreted as a integer pointer.

Thursday, July 21, 2011

The Ugly Thing about char [], char *, const char *, char * const and char const *

These concepts seem simple, but mistake can be made if you haven't thoroughly understood them.

  • is following code correct?
foo()
{
char a[]= "I HATE U!";
a[0] = 'U';
}
        Yes,  array a[] will be allocated on stack and a[0] just modifies the first character.

  • is following code correct?
foo()
{
char *a = "I HATE U!";
a[0] = 'U';
}
        Compiler won't complain, but the program will crash. The reason is "I HATE U!" is a constant that will be allocated in the constant memory region.  If we try to modify its value thru pointer a, you will be hit by seg fault.
        However,  initializing the variable  takes a huge performance and space penalty for the array (using char a[] = "XXXX"). Only use the array method if you intend on changing the string, it takes up space in the stack and adds some serious overhead every time you enter the variable's scope. Use the pointer method otherwise.

  • is following code correct?
foo()
{
cont char *a = "I HATE U!";
a[0] = 'U';
}
        No, compiler will complain.  a is a pointer points to char constant, you can modify what a points to (a's value), but you can't modify the content of the address that a points to (*a's value).

  • is following code correct?
foo()
{
char * const a = "I HATE U!";
a++;
}
        No, compiler will complain.  a is a constant pointer, therefore you can'y modify what a points to (a's value). But you can modify the content of the address that a points to (*a's value) if that address is valid to access. In the above example, the address is not valid to access.

  • is following code correct?
foo()
{
char const *a = "I HATE U!";
a[0] = 'U';
}
        No, compiler will complain. char const *a is the same as const char * a. 

Monday, July 18, 2011

Find the Sub-Matrix with the Largest Sum in a Matrix

Problem: Given a m*n matrix that is made up of integers (positive and negative), find the sub-matrix with the largest sum.

Solution: There is an O(n) algorithm to find the sub-array with the largest sum in a one-dimension array. So here, we will try to reduce our problem to the one-dimension array problem.

  1. Assume the sub-matrix is between row a and b where 0<=a<=b<=m,  then we discard all the rows that are out of the range [a, b]. We then squeeze the sub-matrix into a one-dimension array. How to squeeze? Just use the sum of the elements remaining in each row to replace that row! Therefore we get a one dimensional array with n elements (each element is a sum). 
  2. Then apply the O(n) algorithm we mentioned previously, we will get the sub-array with the largest sum in this one-dimension array, which is actually the sub-matrix between row a and with the largest sum.
  3. Until now, we have examined one pair a and b. If we try all the combination of a and b, we will be able to find the the sub-matrix in the with the largest sum in the m*n matrix. Assume n<m, we need repeat n^2 such operation. Then the overall time complexity is O(n^3). Naive way may take O(n^4). 
  4. In order to make the "squeeze" operation (summing the column elements) with a constant time complexity, we need to do some pre-processing. We just need to calculate the prefix-sum. For example, c[i][j] = a[0][j] + a[1][j] +...+ a[i][j] . It is easy to calculate this for the whole matrix with time complexity O(n^2).

Search for an Element in a Matrix

Problem: m*matrix with its each row and each column sorted in ascending order. Given a target element, find that element in the matrix or return null if not found.

Solution: The following gives an O(m+n) algorithm.

  1. Start from matrix[0][n-1], compare it with the target element. If equal, return; if matrix[0][n-1] is smaller than the target value, go vertically down to next row (examine matrix[1][n-1]); if matrix[0][n-1] is bigger than the target value, go horizontally left to the previous column (examine matrix[0][n-2]).
  2. Compare the current cell in the matrix with the target element. If equal, return; if current cell is smaller than the target value, go vertically down to next row. if current cell is bigger than the target value, go horizontally left to the previous column.
  3. Stop when it reaches the last row or first column or the target element is found. 
  4. We actually can use binary search here.

Sunday, July 17, 2011

Paint a Cube with Three Colors

Problem: Given a cube, each face can be painted with one of the three colors. Calculate the ways the cube can be colored.

Solution: The problem can be solved by enumeration. While a more formal way is to use Burside's Lemma. Generally we need to understand the symmetry of the permutation on cubes. The main ideas of this method are listed as follows:

  1. find the permutation groups of a cube, the easiest one is the identity permutation (just don't change). Then for the six faces, we have 1->1, 2->2 ... and 6->6. So for this permutation, there are 6 circles and the length of each circle is 1. Therefore we have a1^6, where a1 means the sub group that with circle length 1. "6" means there are 6 such circles.
  2. then we can rotate along the axis that penetrates the centers of two opposite faces for 90 degree. If we do this, we have a1^2*a4, since two face unchanged, and the rest 4 form a circle. We have 6 such rotations. Therefore at the end we have 6*a1^2*a4.
  3. then we can rotate along the axis that penetrates the centers of two opposite faces for 180 degree. If we do this, we have a1^2*a2^2.We have 3 such rotations. Therefore at the end we have 3*a1^2*a2^2.
  4. then we can rotate the two opposite vertices (diagonal). We will have a3^2, since three faces that share one vertex are in one circle. We have 8 such rotations. Therefore at the end we have 8*a3^2.
  5. then we can rotate along the axis that penetrates the middle points of two parallel edges that are not in the same face for 180 degree. We will have 6*a2^3 at the end.
  6. so totally we have 1+6+3+8+6 = 24ways of permutations. So the cycle index (the ways to paint the cube) = 1/24*(a1^6+6*a1^2*a4+3*a1^2*a2^2+8*a3^2+6*a2^3). Since a1=a2=...=a6, we have cycle index =  1/24*(a^6+3*a^4+12*a^3+8*a^2). 
  7. when a = 3, we have cycle index = 57.

Find the Kth Positive Integer That Can Be Divided Only by 3, 5 or 7

Problem: find the kth positive integer that only have factors as 3, 5, or 7. For example, the first integer is 3, then 5, 7, 9, 15, 21......

Solution: It is a DP problem where we should leverage on previous status. Then we can avoid redundant computation. The main ideas are stated as follow:

  1.  Have three queues, let us say Q3, Q5 and Q7. Put 3, 5 and 7 in these queues, receptively.
  2. Then select the smallest element among the three queues by just comparing the heads of the queues. If the smallest element s is from Q3, Q3.enque(s*3), Q5.enque(s*5) and Q7.enque(s*7); If the smallest element s is from Q5, Q5.enque(s*5) and Q7.enque(s*7);   If the smallest element s is from Q7, Q7.enque(s*7). Since we strictly follow the ascending order, we guarantee that there is not redundant computation.
  3. Repeat the selection (step 2) for k times, what we get is the kth positive integer that only have factors as 3, 5, or 7.
  4. The time complexity and space complexity are both O(k).

Saturday, July 16, 2011

Calculate the Occurrence of "2" from 1 to N

Problem: Given an integer N, calculate the occurrence of digit "2 " in all the numbers from 1 to N. For example, if N=24, then the integers that have digit "2" are 2, 12, 20, 21, 22, 23 and 24. So totally eight "2"s ( "22" contains two "2"s).

Solution: It is a straightforward solution, Just inspect each digit starting from the tail. For each digit, we need to think of three cases: >2, =2 and <2. The code is listed as follow:

int calculte_2(int input)
{
     int cnt = 0;
     int base = 10;
     int quotient,remainder;

     do
     {
        quotient= input/base;
        remainder = (input%base) / (base/10);
        
        if(remainder>2) cnt += (quotient+1)*base/10;
        else if(remainder<2) cnt +=  quotient*base/10;
        else if(remainder==2) cnt +=  quotient*base/10 + (input%(base/10)) + 1;               
        base*=10;
           
     }while(quotient>0);
     
     return cnt;     

}

Friday, July 15, 2011

Calculate Square Root and Cubic Root

Problem: Calculate the square root of a non-negative real number.

Solution: The basic idea is to start with a seed, then begin trial. If our guess (seed) is smaller, than we increase it. Otherwise, we decrease it. However, to get a good convergence speed, we need guess the square root in a smart way. The following describes Babylonian method, which is a quadratically convergent algorithm.
  1. Start with a seed x, assume S is the real number we want to know its square root, calculate S/x
  2. Then the square root of S must be in between x and S/x. We choose the average of and S/x as the new guess and repeat step 1. If certain accuracy is achieved, we stop.
  3. To select the starting seed, if S has d digits and S>1, choose 2*10^((d-1)/2) if S is odd, choose 6*10^((d-2)/2) if S is even. The case for S<1 is similar.
More: This is actually a special case of Newton's method. Newton's method can be used to find the roots of a real value function. The basic idea is that from a point (x0, y0) on the curve, we draw a tangent, assume the tangent across x axis at x1. Assume the root is xr, f(xr) = 0, we have |x1-xr| < |x0-xr|. If we keep draw tangent at x1, we will be even closer to xr. Basically we have x_k+1 = x_k- f(x_k) / f'(x_k).6

More more: This method can be extended to calculate the cubic root as well. The only thing we need to change is rather than calculating the average of and S/x as the new guess, we calculate the average of x , x and S/x^2 as the new guess. Therefore, x' = (2*x+S/x^2)/3.

Tuesday, July 12, 2011

Maximum Flow and Minimum Cut Problem

Problem: Given a network that contains a single source and a single sink, find the maximum (aggregated) flow from the source to the sink. This is Maximum Flow problem.


Solution: The solution is as follow:
  1. find a flow in which each edge has strictly positive capacity. Then find the bottleneck edge in the flow. Subtract each edge's capacity with the bottleneck edge's capacity. 
  2. try to find another flow in which each edge has strictly positive capacity. Repeat step 1.
  3. if such flow cannot be found, terminate. The maximum flow is the sum of all the capacity of the bottleneck edge of the flow processed. 
Minimum Cut problem is from Maximum Flow problem. Basically, first run maximum flow algorithm to find those bottleneck edges. Then the minimum cut is from those edges.

More: Maximum flow (minimum cut) is frequently used in the problem related to flow network. Some other problem which can be solved by (or converted to) maximum flow problem are maximum matching in bipartie graph, minimum path cover, etc.


Monday, July 11, 2011

Check If a Graph is a Bipartite Graph

Problem: Given a graph, check it is a bipartite graph. A bipartite graph is a graph whose vertices can be divided into two sets. Then for every edge, the vertices connected by the edge are distributed in the two different sets (can't be both in one set).

Solution: If is similar to a coloring problem. We can do a BFS traversal to the graph. We need some extra space to store i) if a vertex is visited ii) the parity of that vertex. Or we can just use two hash tables.

  • start from any vertex in the graph, we mark that vertex as visited, its parity as 1. Then we set the parity of its neighbors as 2. Since all the neighbors are not visited now, we put them in a queue.
  • remove a vertex from the queue, if it is visited, ignore it. Otherwise, mark it as visited, check all its visited neighbors, if the parities are all odd or even, then the graph is not bipartite. Otherwise, set the parity of its non-visited neighbors as i+1 (assume the current vertex's parity is i) and add non-visited vertices to the queue.
  • repeat the above step util the queue is empty. If the graph is not fully connected, need one extra step to check if all the vertices are visited

Sunday, July 10, 2011

Reverse the Words in a String in Place

Problem: Reverse the order of words in a string. For example, given "I like Juventus", "Juventus like I" should be the output.

Solution: First in place reverse the whole string. We will get "suntevuJ ekil I". The we just reverse the characters in each word.

Saturday, July 9, 2011

The Comparison between BST, Hashtable, Array and Linked List

BST:
  • locating/deleting/inserting an element is not slow -- O(logn)
  • maintain the order between elements
  • support range query 
  • if not balanced, the worst case is like a linked list

Hashtable:
  • look up operation is fast: O(1) if the hash function is well chosen.
  • have space overhead
  • don't support range query

Array:
  • allow random access
  • can explore cache locality
  • updating is costly

Linked List:
  • allow easy insertion/deletion
  • do not support random access
  • may not have cache locality

Tuesday, July 5, 2011

Find the Kth Smallest Element of Two Sorted Arrays

Problem: given two sorted arrays, find the Kth smallest element.

Solution: if we have two pointer scanning from the heads of two arrays, we can have an O(K) algorithm. Here we give the main idea of an O(logm + logn) algorithm, where m and n are the length of the two arrays, respectively.
  1. We pick the ith and jth elements from two arrays. We make i+j = K-1. Then if Ai > Bj && Ai < Bj+1, Ai is the Kth smallest element. The other way is the same (Ai < Bj).
  2. If the previous condition doesn't hold, which means Ai < B&& Ai < Bj+1. Then the Kth smallest element cannot be within A0 to Ai and Bj+1 to Bn-1. Then we remove them and only need to look for the K-i-1th smallest element in the sub-arrays left.

Find the Median of Two Sorted Arrays

Problem: given two sorted arrays, find the median.

Solution: Assume the length of the two arrays are m and n. When m+n is even, the median is the average of two numbers. The neat solution has a time complexity of O(log(m+n)). Basically, we need to leverage binary search. The details of this algorithm is very complex due to many corner cases to handle. The following just gives the main idea.

  1. get the median of two arrays Ai and Bj, where i = m/2 and j = n/2. If  Ai <= Bj, the median will be between Ai and Bj.
  2. therefore, we can discard A0 to Ai-1 and Bj+1 to Bn-1. However, we cannot do this in a naive way. To reduce to an equivalent sub-problem, we need to discard the same number of elements from each array. Then we keep comparing the middle elements of the sub-arrays. To discard the same number of elements, we just need to get Min(in-j-1).
Besides, we can also use the techniques in Find the Kth Smallest Element of Two Sorted Arrays.
    More: a more general problem is to find the median for K sorted arrays. There are two ways:
    1. Guess a number, search each array to see how many elements are smaller than this number. Then we can have a total number. If this total is smaller than half, we guess a bigger number; otherwise, we guess a smaller number. Try to repeat binary search until our goal is met.
    2. The second approach is to first find the medians of all the arrays. Then we can know the bounds of these medians (low_m, high_m). For each array, throw the elements that are out of the bounds. Make sure the elements thrown at the two ends of the array should be the same number. Then repeat until our goal is met.

    Estimate 3^100

    Problem: estimate the value of 3^100.

    Solution: A so called "72" rule will be used. The rule states if the percentage of annual GDP increase is a, it takes 72/a years to double the GDP. So 3^100 = 9^50 = 10^50/1.1^50 = 10^50/(1.1*1.1^(7*7)) = 10^50/(1.1*2^7) = 10^50/140 = 1000/140 *10^47 = 7*10^47. Another way is to leverage 3^9 = 20000.

    Longest Common Subsequence

    Problem: Longest common subsequence is the subsequence that appear in multiple sequences (usually two) simultaneously. For example, the longest common sequence of "13486" and "23861' is "386". The diff program is based on this algorithm. Besides, this algorithm is frequently used in bioinformatic research.


    Solution: Still we will use DP, but different from longest Increasing subsequence, the solution we have is O(n*m), where n and m are the length of two sequence. If we have more than two sequence, the time complexity will be O(n1*n2*n3*...). The following gives the algorithm for two sequences:

    1. look at the last elements A[n], B[m] of two sequences, if A[n] == B[m], we know the longest common subsequence (LCS) between  A[n] and B[m],  LCS(A[1:n], B[1:m]) = LCS(A[1:n-1], B[1:m-1]) +1. 
    2. if A[n] <> B[m], then  LCS(A[1:n], B[1:m]) = MAX(LCS(A[1:n-1], B[1:m]), LCS(A[1:n], B[1:m-1])).
    3. then we will have a matrix for LCS, after filling up the matrix, we will get the answer.
    We can further do some optimization.  One is to optimize the space, the LCS matrix. If most diffs center on the middle of sequence, we can trim the common part first.  Also to be fast, we don't need to compare each character, we can hash the string or the line. Then we just compare the hash value sequences.


    More: The solution to LCS problem can be easily applied to Edit Distance problem. Given two string Xi and Yj, and a number of operations such as copy(), delete(), insert(), etc. Find the optimal way to transform Xi to Yj. Each operation has a cost and our goal is to minimize the total cost of the transformation. Here only gives the recursive relations:

    1. Define T(i,j) as the total cost to transform Xi to Yj (one direction). There are several recursive relations.
    2. T(i,j)  = T(i-1j-1) + cost(copy), if X[i] == Y[j]
    3. T(i,j)  = T(i-1, j-1) + cost(replace), if X[i] <> Y[j]
    4. T(i,j)  = T(i-1, j) + cost(delete)
    5. T(i,j)  = T(ij-1) + cost(insert)
    6. find T(i,j)  as the minimum T(i,j) in step 2,3,4,5

    Longest Increasing Subsequence

    Problem: Longest increasing sequence is the subsequence within a sequence that the elements in the subsequence is in an increasing order. We are trying to find the longest subsequence that has this property. For example, one of the longest increasing subsequences in "17385" is "138".

    Solution: This is a classical DP problem. Naive DP solution will give you a O(n^2) algorithm by keep the longest increasing sequence ended at i. A better solution is to use DP alongside with binary search, which will give you a O(nlogn) algorithm. The solution is as follow:

    1. having an array M, M[j] stores the ending position of the increasing sequence with length in the original sequence X. Thus, X[M[j]] should be an increasing sequence as well.
    2. iterate the original sequence X, when visiting X[i], do binary search in X[M[j]] to look for the closest position to X[i]. If X[i] is bigger than all the existing X[M[j]], it means if we append X[i] to the end of the current longest increasing sequence, we get a new longest increasing sequence which is longer by 1. Then we update M[L+1] = i (L is the length of current longest increasing sequence).
    3. when searching in  in X[M[j]] for X[i], we can also encounter the situation that  X[M[j]]<X[i]< X[M[j+1]]. Then we need to update M[j+1] = i. The reason is that after we have scanned the ith element in X, we need to assure for M[j], we store the increasing sequence of length  j that has the smallest ending element. This will maximize our chance to be able to append further elements to existing increasing sequence.

    Sunday, July 3, 2011

    Get the Kth Permutation

    Problem: Given an array of N numbers {1,2...N},  get the Kth permutation. For example, when N = 3, the first permutation is "123", the second is "132", the sixth is "321".

    Solution: With STL, we can use next_permutation(). Just call it K-1 times. However, next_permutation()'s complexity is not constant. A better way is to use recursion:

    String Find(string a, int k)
    {
       if(len(a)==1 or k==0) return a;
    
       int N = len(a);
       
       int i = k/factorial(N-1);
       int j = k%factorial(N-1);
    
       return a[i] + find(a.substring(0,i) + a.substring(i+1), j);
    }
    

    Binary Heap

    Binary Heap is an important data structure which can be used to implement priority queue. Two types of binary heap are commonly seen: min-heap and max-heap. A binary heap has two properties:

    • Shape property: the heap should be a full binary tree (can be denoted as an array)
    • heap property: children should be smaller (bigger) than parent
    When inserting an element into heap, we insert from bottom (leaf). Then we check if the two properties hold, otherwise we swap parent and child. When removing an element from heap (always from root), we use the last element at the last level to replace root, then we adjust heap. So,

    • The time complexity of insertion is O(logN)
    • The time complexity of removal is O(logN)
    • The time complexity of building a heap can be O(N)
    With binary heap, a K-way merge can be done in O(N*logK), naive ways take O(N*K). Besides, we also have min-max heap which can insert remove the min and max element in O(logN). Basically, it is a mixed heap with odd level to be min heap and even level to be max heap.

      Collecting Apples

      Problem: The original problem is from TopCoder. Given a M*N matrix, in each cell, there are a number of apples. When you visit one cell, you can grab the apple away. You make three passes to collect apples. In the first pass, you start from top-left,  go either right or down until reading bottom-right. In the second pass, you start from bottom-right, go either up or left until reading top-left. In the third pass, you go from top-left to bottom-right again. What are the maximum number of apples you can collect?

      Solution: This is a DP problem. But first we need to do some reduction to convert the problem to a easier one. If the problem only contains the first pass, it is an easy DP problem.

      • One key observation is the second pass can be reversed so that the problem can be transferred to that three paths starting from top-left to bottom-right.
      • We create a status matrix MAX[][][], MAX[i][j][k] denote at a certain round, the column coordinates of the three paths (you can get the row coordinates from i,j,k and round number x). 
      • From the previous position to current position, at most three cells should be visited. They are (x-i, i), (x-j, j) and (x-k, k). Then we can get the apples at the three cells (remember don't add the duplicate cell). There are multiple ways (2*2*2) to reach these three cells (from up or from left). They we need to leverage the MAX[][][] from previous round to find a previous state to the three cells which can maximize MAX[i][j][k]. 
      • There are totally M+N-2 rounds and after the final round, the answer will be stored in MAX[N-1][N-1][N-1]
      The key algorithm is as follow:
        public int mostApples(string[] L) {
        
            int X=L[0].Length, Y=L.Length, XY=X+Y;
            int[,,] best = new int[X,X,X]; 
            int[,,] b2 = new int[X,X,X];     
             
            for (int A=0; A<X; A++) 
            for (int B=0; B<X; B++) 
            for (int C=0; C<X; C++)    
                best[A,B,C] = -999999999;
        
            best[0,0,0] = 0;
        
            for (int t=1; t<=X+Y-2; t++)
            {    
                for (int A=0; A<X; A++) 
                for (int B=0; B<X; B++) 
                for (int C=0; C<X; C++)
                {
                   int aa = t-A, bb=t-B, cc=t-C;
                   if (aa < 0 || bb < 0 || cc < 0 || aa >= Y || bb >= Y || cc >= Y) continue;
          
                   int bestHere = 0;
                   int delta = (int)(L[aa][A] - 48);
                   
                   if (B != A) delta += (int)(L[bb][B] - 48);
                   if (C != A && C != B) delta += (int)(L[cc][C] - 48);
         
                   if ( A>0 &&  B>0 &&  C>0) bestHere = Max(bestHere, best[A-1, B-1, C-1] + delta);
                   if ( A>0 &&  B>0 && cc>0) bestHere = Max(bestHere, best[A-1, B-1, C  ] + delta);
                   if ( A>0 && bb>0 &&  C>0) bestHere = Max(bestHere, best[A-1, B,   C-1] + delta);
                   if ( A>0 && bb>0 && cc>0) bestHere = Max(bestHere, best[A-1, B,   C  ] + delta);
                   if (aa>0 &&  B>0 &&  C>0) bestHere = Max(bestHere, best[A,   B-1, C-1] + delta);
                   if (aa>0 &&  B>0 && cc>0) bestHere = Max(bestHere, best[A,   B-1, C  ] + delta);
                   if (aa>0 && bb>0 &&  C>0) bestHere = Max(bestHere, best[A,   B,   C-1] + delta);
                   if (aa>0 && bb>0 && cc>0) bestHere = Max(bestHere, best[A,   B,   C  ] + delta);
                   b2[A,B,C] = bestHere;
                 }
         
                 int[,,] temp = best; 
                 best = b2; 
                 b2 = temp;
          }
         
        return best[X-1,X-1,X-1];
        }