在自然语言处理任务中,有时候需要计算两个字符串之间的相似度,也可以称作是两者之间的距离,用最小编辑距离表示。
最小编辑距离用{Insertion,Deletion,Substitution}这三种操作把一个字符串转化成另一个字符串所需的操作次数,等同于LeetCode上的第72题,描述如下:
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
本题使用递归算法,设D(i,j)为字符串m的前i个字符组成的字符串和n的前j个字符组成的字符串之间的最小编辑距离,然后逐渐递归得到D(m,n)的值,也即是word1和word2之间的距离。
Initialization:
D(i,0)=i;
D(0,j)=j;
Recurrence Relation:
For each i=1...M
For each j=1...N
D(i-1,j)+1 //删除操作
D(i,j)=min D(i,j-1)+1 //增加操作
D(i-1,j-1)+X //替换操作,替换的代价是X,X可以自己设置
Termination:
D(M,N)就是我们要求的距离
代码如下:
class Solution { public int minDistance(String word1, String word2) { int[][] strLen = new int[word1.length()+1][word2.length()+1]; for (int i=0;i<=word1.length();i++) strLen[i][0] = i; for (int j=0;j<=word2.length();j++) strLen[0][j] = j; for (int i=1;i<=word1.length();i++){ for(int j=1;j<=word2.length();j++){ if(word1.charAt(i-1)==word2.charAt(j-1)) strLen[i][j] = strLen[i-1][j-1]; else{ strLen[i][j]=Math.min(strLen[i-1][j],strLen[i][j-1]); strLen[i][j]=Math.min(strLen[i][j],strLen[i-1][j-1])+1; } } } return strLen[word1.length()][word2.length()]; } }