zoukankan      html  css  js  c++  java
  • C#和SQL实现的字符串相似度计算代码分享

    http://www.jb51.net/article/55941.htm

    C#实现:

    复制代码 代码如下:

    #region 计算字符串相似度
            /// <summary>
            /// 计算字符串相似度
            /// </summary>
            /// <param name="str1">字符串1</param>
            /// <param name="str2">字符串2</param>
            /// <returns>相似度</returns>
            public static float Levenshtein(string str1, string str2)
            {
                //计算两个字符串的长度。 
                int len1 = str1.Length;
                int len2 = str2.Length;
                //比字符长度大一个空间 
                int[,] dif = new int[len1 + 1, len2 + 1];
                //赋初值,步骤B。 
                for (int a = 0; a <= len1; a++)
                {
                    dif[a, 0] = a;
                }
                for (int a = 0; a <= len2; a++)
                {
                    dif[0, a] = a;
                }
                //计算两个字符是否一样,计算左上的值 
                int temp;
                for (int i = 1; i <= len1; i++)
                {
                    for (int j = 1; j <= len2; j++)
                    {
                        if (str1.Substring(i - 1, 1) == str2.Substring(j - 1, 1))
                        {
                            temp = 0;
                        }
                        else
                        {
                            temp = 1;
                        }
                        //取三个值中最小的 
                        dif[i, j] = Min(dif[i - 1, j - 1] + temp, dif[i, j - 1] + 1, dif[i - 1, j] + 1);
                    }
                }
                return 1 - (float)dif[len1, len2] / Math.Max(str1.Length, str2.Length);
            }
            #endregion

            //比较3个数字得到最小值 
            private static int Min(int i, int j, int k)
            {
                return i < j ? (i < k ? i : k) : (j < k ? j : k);
            }

    SQL实现:

    复制代码 代码如下:

    CREATE   function get_semblance_By_2words
    (
    @word1 varchar(50),
    @word2 varchar(50)  
    )
    returns nvarchar(4000)
    as
    begin
    declare @re int
    declare @maxLenth int
    declare @i int,@l int
    declare @tb1 table(child varchar(50))
    declare @tb2 table(child varchar(50))
    set @i=1
    set @l=2
    set @maxLenth=len(@word1)
    if len(@word1)<len(@word2) 
    begin
    set @maxLenth=len(@word2)
    end
    while @l<=len(@word1) 
    begin
    while @i<len(@word1)-1
    begin
    insert @tb1 (child) values( SUBSTRING(@word1,@i,@l) ) 
    set @i=@i+1
    end
    set @i=1
    set @l=@l+1
    end
    set @i=1
    set @l=2
    while @l<=len(@word2) 
    begin
    while @i<len(@word2)-1
    begin
    insert @tb2 (child) values( SUBSTRING(@word2,@i,@l) ) 
    set @i=@i+1
    end
    set @i=1
    set @l=@l+1
    end  
    select @re=isnull(max( len(a.child)*100/  @maxLenth ) ,0) from @tb1 a, @tb2 b where a.child=b.child
    return @re
    end
    GO
     
    --测试
    --select dbo.get_semblance_By_2words('我是谁','我是谁啊') 
    --75
    --相似度
  • 相关阅读:
    Python并发编程之多进程(实战)
    ThreadPoolExecutor源码分析
    JDK 1.8 JVM的变化
    JDK1.8 hashMap源码分析
    Spring解决循环依赖
    spring
    实现一个可重入锁和不可重入锁
    B树与B+树
    WebMagic
    Java高频面试题
  • 原文地址:https://www.cnblogs.com/shiningrise/p/4859033.html
Copyright © 2011-2022 走看看