Today, I thought of comparing Java 6 and Java 7 performance. And in the process I picked up an interesting problem for benchmarking.
Introduction
Edit Distance between two Strings is defined as the number of operations it requires to convert one string into another with allowed operations being INSERT, DELETE, SUBSTITUTION.
For example
SWEET
CUTE
How many operations it takes to convert SWEET into CUTE.
This algorithm has very important usage, number one being mis-spellings and form the core algorithm of Lucene APIs. In this post, I am not worried about the run time of algorithm, since I want to have an algorithm which grows exponentially to compare java 6 and java 7.
Algortihm
There are only three possibilities
1. INSERT
2. DELETE
3. SUBSTITUTION
Java Implementation
Here's the java code for recursive program
package com.param.test;
public class Benchmark {
public static void main(String[] args) {
Benchmark b = new Benchmark();
char[]s = {'s','u','n','n','y'};
char[]t = {'s','n','o','w','y'};
int dist = b.editDistance(s, t, s.length-1, t.length-1);
System.out.println("min distance is " + dist);
}
/**
*
* Implementing edit Distance program with normal recursion
* The program increases by factor of 3^n
*
* @param s
* @param t
* @param i
* @param j
* @return
*/
public int editDistance(char[]s, char[]t, int i, int j){
int opt[] = new int[3];
if(i==0)
return j*indel(' ');
if(j==0)
return i*indel(' ');
opt[0] = editDistance(s,t,i-1,j-1) + match(s[i],t[j]);
opt[1] = editDistance(s,t,i,j-1) + indel(s[i]); // Insert
opt[2] = editDistance(s,t,i-1,j) + indel(t[j]); // delete
return min(opt[0],opt[1],opt[2]);
}
public int match(char s1, char t1){
if(s1==t1)
return 0;
return 1;
}
public int indel(char s1){
return 1;
}
public int min(int a, int b, int c){
if(a
return a;
else if(b
return b;
return c;
}
}
The answer of above problem is - "min distance is 3"
Asymptotic Analysis Let us analyze the above code. Ideally there are m*n total characters and there should be only m*n maximum recursive calls. But above program it grows way more than that.
At 3^n recursive calls, since each call to editDistance function spans into 3 separate calls.
The best solution for this problem is to implement using Dynamic Programming. I will upload the solution using DP soon. But for now this program is good for performance comparison of Java 6 and 7, yup will upload my findings soon :) .
In case of any issues or better clever recursive solutions to this problem, please leave comments.
No comments:
Post a Comment