zoukankan      html  css  js  c++  java
  • 2019-8-31-C#-对-byte-数组进行模式搜索

    title author date CreateTime categories
    C# 对 byte 数组进行模式搜索
    lindexi
    2019-08-31 16:55:58 +0800
    2018-07-19 11:16:46 +0800
    C#

    本文告诉大家几个方法从 byte 数组找到对应的相同序列的数组

    最简单的方法是进行数值判断,但是代码最少是使用Linq ,效率比较高是使用 Boyer-Moore 算法,下面就告诉大家几个算法的代码

    判断数值

        class ByteArrayRocks
        {
            public static IEnumerable<int> IndexOf(byte[] source, int start, byte[] pattern)
            {
                if (IsEmptyLocate(source, start, pattern))
                {
                    yield break;
                }
    
                for (int i = start; i < source.Length; i++)
                {
                    if (!IsMatch(source, i, pattern))
                    {
                        continue;
                    }
    
                    yield return i;
                }
            }
    
            private static readonly int[] Empty = new int[0];
    
            private static bool IsMatch(byte[] array, int position, byte[] candidate)
            {
                if (candidate.Length > (array.Length - position))
                {
                    return false;
                }
    
                for (int i = 0; i < candidate.Length; i++)
                {
                    if (array[position + i] != candidate[i])
                    {
                        return false;
                    }
                }
    
                return true;
            }
    
            private static bool IsEmptyLocate(byte[] array, int start, byte[] candidate)
            {
                return array == null
                       || candidate == null
                       || array.Length == 0
                       || array.Length < start
                       || candidate.Length == 0
                       || candidate.Length + start > array.Length;
            }
        }

    这是最简单的方法,参见 https://stackoverflow.com/a/283648/6116637

    linq

    这个方法的代码最少

        class LinqArraySearch
        {
            public static IEnumerable<int> IndexOf(byte[] source, int start, byte[] pattern)
            {
                for (int i = start; i < source.Length; i++)
                {
                    if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern))
                    {
                        yield return i;
                    }
                }
            }
        }

    Boyer-Moore-Horspool 搜索

    这是最快的方法

        class BoyerMooreHorspool
        {
            public static IEnumerable<long> IndexesOf(byte[] source, int start, byte[] pattern)
            {
                if (source == null)
                {
                    throw new ArgumentNullException(nameof(source));
                }
    
                if (pattern == null)
                {
                    throw new ArgumentNullException(nameof(pattern));
                }
    
                long valueLength = source.LongLength;
                long patternLength = pattern.LongLength;
    
                if ((valueLength == 0) || (patternLength == 0) || (patternLength > valueLength))
                {
                    yield break;
                }
    
                var badCharacters = new long[256];
    
                for (var i = 0; i < 256; i++)
                {
                    badCharacters[i] = patternLength;
                }
    
                var lastPatternByte = patternLength - 1;
    
                for (long i = 0; i < lastPatternByte; i++)
                {
                    badCharacters[pattern[i]] = lastPatternByte - i;
                }
    
                long index = start;
    
                while (index <= valueLength - patternLength)
                {
                    for (var i = lastPatternByte; source[index + i] == pattern[i]; i--)
                    {
                        if (i == 0)
                        {
                            yield return index;
                            break;
                        }
                    }
    
                    index += badCharacters[source[index + lastPatternByte]];
                }
            }
        }
    

    参见:https://stackoverflow.com/q/16252518/6116637

    测试了三个方法的性能,请看下面

    BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
    Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
    .NET Core SDK=2.1.202
      [Host]     : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
      DefaultJob : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
    
    
    Method Mean Error StdDev
    Linq 13,332.8 us 251.782 us 466.694 us
    ByteArrayRocks 371.3 us 7.327 us 14.462 us
    BoyerMooreHorspool 108.3 us 1.153 us 1.079 us

    其他方法请看下面

    使用不安全代码的 Boyer Moore 算法 C# High Performance Boyer Moore Byte Array Search Algorithm

  • 相关阅读:
    Visual Studio 中的Code Snippet(代码片段)功能介绍
    对 FID 的解释
    感性负载和二极管保护
    二极管信号检波器
    java如何自动设置数据库自增长编号
    JDK1.8遍历方式
    转载:windows查看进程端口号
    发布订阅:分发代理无法在C:Program FilesMicrosoft SQL Server120COM 目录中创建临时文件
    confluence搭建wiki
    Hadoop集群安装(centos7.5+ambari-2.7.4.0部署安装)
  • 原文地址:https://www.cnblogs.com/lindexi/p/12085992.html
Copyright © 2011-2022 走看看