zoukankan      html  css  js  c++  java
  • Edit Distance

    Introduce

    Through the following, u will know what's edit distance.

    target : you are cute  and  I         love u.

    source:   I   am cute          I don't love u.

    For the two sentences, u can use edit including insert, delete and substitute the source so that it can be same to the target.

    u will subsutitute "you" for "I", which will cost 2

    subsutitue "are" for "am", which will cost 2

    insert "and", which will cost 1

    delete "don't", which will cost 1

    so, totally, u will cost 2 + 2 + 1 + 1 = 6, which is the edit distance.

    How to calculate the minimum edit distance

    Now, we take another example

    target : s t o p

    source: s o t

    u can insert "t", which will cost 1, to get "stot", then u can substitute "p" for "t", which will cost 2, to get "stop".

    Besides, u can substitue "t" for "o" to get "stt", then u can substitute "o" for "t" to get "sto", last, u can insert "p" to get "stop"

    Alright, u can have plenty of method to edit to get the target, so how to get the minimum edit distance. We will introduce the algorithm as following, which is dynamic programming.

    i : the length of the target

    j : the length of the source

    D(0, 0) = 0

    D(i, 0) = insertCost * i, means the length of the target is i and the length of the source is 0, so u should insert i characters

    D(0, j) = deleteCost * j, means the length of the target is 0 and the length of the source is j, so u should delete j characters

    D(i, j) = min {D(i - 1, j) + insertCost(target[i]);

                        D(i - 1, j - 1) + substituteCost(source[j], target[i]);

                        D(i, j - 1) + deleteCost(source[j])} means u should choose the method which will cost least.substituteCost = {0, 2}, if source[j] is equal to target[i], it's 0, or it's 2.

    insertCost = 1

    deleteCost = 1

    Well, maybe u will feel confuse, but no problem, We will use the algorithm to solve the above question.

    firstly, u can gain the belew form.

     

    D(1, 1) = min {D[0, 1] + insert(t[1]) = 2;

                          D[0, 0] + substitute(s[1], t[1]) = 0;

            D[1, 0] + delete(s[1]) = 2}          =    0

    D[1, 2] = min{ D[0, 2] + insert(t[1]) = 3;

            D[0, 1] + substitute(s[2], t[1]) = 3;

            D[1, 1] + delete(s[2] = 1)}            =   1

    ................

    then u will get the last number

    D[4, 3] = min {D[3, 2] + insert(t[4]) = 3;

            D[3, 2] + substitute(s[3], t[4]) = 3;

            D[4, 2] + delete(s[3]) = 3}    =    3

    Actually, u can get the path how to get edit through every step. The below picture show the results.

  • 相关阅读:
    python基础之包、模块、命名空间和作用域
    python基础之函数式编程
    python基础之文件操作
    python基础之psutil模块和发邮件(smtplib和yagmail)
    【面试题21】包含min函数的栈
    【面试题20】顺时针打印矩阵
    【面试题19】二叉树的镜像
    【面试题18】树的子结构
    【面试题17】合并两个排序的链表
    【面试题16】反转链表
  • 原文地址:https://www.cnblogs.com/chuanlong/p/2993774.html
Copyright © 2011-2022 走看看