Edit Distance (Levenshtein Distance)

Strings Dynamic Programming

Problem

Given two strings s1 and s2, find the minimum number of edits that can be performed on s1 to convert it to s2. The operations that can be performed on s1 are “remove”, “insert” and “replace”.

Example:

Input:
  s1 = "foo"
  s2 = "flood"

Output:
  2 (insert "l" and "d")

Approach

A brute force approach solution requires to visit s1 and s2 recursively and compute the minimum cost of performing one of the operations on s1 to transform it to s2. Time complexity of this solution is O(3^n).

At a closer look this problem has the 2 main properties of a recursive problem that can be solved with Dynamic Proramming: a) Optimal sub-structure and b) overlapping sub-problems.

Because of this we can use a matrix to store results of each subproblem. The matrix will have m rows and n columns, where m is the length of s1 and n is the length of s2. Time complexity of this algorithm is O(mn) and space complexity is again O(mn) because we need to store the matrix in memory.

This is problem is called Levenshtein Distance.

Solution

int computeEditDistance(String s1, String s2) {
   int m = s1.length();
   int n = s2.length();

   // Initialize a 2-dimentional array to store results of the sub-problems.
   int[][] dp = new int[m+1][n+1];

   for (int i=0; i <= m; i++) {
       for (int j=0; j <= n; j++) {

           // If s1 is empty then the solution is to insert all characters of
           // s2 in s2, so the solution is the len of s2 which is currently j
           if (i==0) {
               dp[i][j] = j;
               continue;
           }

           // If s2 is empty then the solution is to remove all characters of s1,
           // so the solution is the length of of s1 which is currently i
           if (j==0) {
               dp[i][j] = i;
               continue;
           }

           // If the characters are the same then there is no action necessary,
           // so the number of operations required is the same as the previous step
           if (s1.charAt(i-1) == s2.charAt(j-1)) {
               dp[i][j] = dp[i - 1][j - 1];
           } else {
               // If the characters are different then consider all the possible
               // operations and find the minimum.
               dp[i][j] = 1 + min(dp[i][j-1], dp[i-1][j], dp[i-1][j-1]);
           }
       }
   }

   return dp[m][n];
}

static int min(int a, int b, int c) {
    if (a <= b && a <= c) {
        return a;
    }

    if (b <= a && b <= c) {
        return b;
    }

    return c;
}