zoukankan      html  css  js  c++  java
  • 2019-8-31-C#-对-byte-数组进行模式搜索

    title author date CreateTime categories
    C# 对 byte 数组进行模式搜索
    lindexi
    2019-08-31 16:55:58 +0800
    2018-07-19 11:16:46 +0800
    C#

    本文告诉大家几个方法从 byte 数组找到对应的相同序列的数组

    最简单的方法是进行数值判断,但是代码最少是使用Linq ,效率比较高是使用 Boyer-Moore 算法,下面就告诉大家几个算法的代码

    判断数值

        class ByteArrayRocks
        {
            public static IEnumerable<int> IndexOf(byte[] source, int start, byte[] pattern)
            {
                if (IsEmptyLocate(source, start, pattern))
                {
                    yield break;
                }
    
                for (int i = start; i < source.Length; i++)
                {
                    if (!IsMatch(source, i, pattern))
                    {
                        continue;
                    }
    
                    yield return i;
                }
            }
    
            private static readonly int[] Empty = new int[0];
    
            private static bool IsMatch(byte[] array, int position, byte[] candidate)
            {
                if (candidate.Length > (array.Length - position))
                {
                    return false;
                }
    
                for (int i = 0; i < candidate.Length; i++)
                {
                    if (array[position + i] != candidate[i])
                    {
                        return false;
                    }
                }
    
                return true;
            }
    
            private static bool IsEmptyLocate(byte[] array, int start, byte[] candidate)
            {
                return array == null
                       || candidate == null
                       || array.Length == 0
                       || array.Length < start
                       || candidate.Length == 0
                       || candidate.Length + start > array.Length;
            }
        }

    这是最简单的方法,参见 https://stackoverflow.com/a/283648/6116637

    linq

    这个方法的代码最少

        class LinqArraySearch
        {
            public static IEnumerable<int> IndexOf(byte[] source, int start, byte[] pattern)
            {
                for (int i = start; i < source.Length; i++)
                {
                    if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern))
                    {
                        yield return i;
                    }
                }
            }
        }

    Boyer-Moore-Horspool 搜索

    这是最快的方法

        class BoyerMooreHorspool
        {
            public static IEnumerable<long> IndexesOf(byte[] source, int start, byte[] pattern)
            {
                if (source == null)
                {
                    throw new ArgumentNullException(nameof(source));
                }
    
                if (pattern == null)
                {
                    throw new ArgumentNullException(nameof(pattern));
                }
    
                long valueLength = source.LongLength;
                long patternLength = pattern.LongLength;
    
                if ((valueLength == 0) || (patternLength == 0) || (patternLength > valueLength))
                {
                    yield break;
                }
    
                var badCharacters = new long[256];
    
                for (var i = 0; i < 256; i++)
                {
                    badCharacters[i] = patternLength;
                }
    
                var lastPatternByte = patternLength - 1;
    
                for (long i = 0; i < lastPatternByte; i++)
                {
                    badCharacters[pattern[i]] = lastPatternByte - i;
                }
    
                long index = start;
    
                while (index <= valueLength - patternLength)
                {
                    for (var i = lastPatternByte; source[index + i] == pattern[i]; i--)
                    {
                        if (i == 0)
                        {
                            yield return index;
                            break;
                        }
                    }
    
                    index += badCharacters[source[index + lastPatternByte]];
                }
            }
        }
    

    参见:https://stackoverflow.com/q/16252518/6116637

    测试了三个方法的性能,请看下面

    BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
    Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
    .NET Core SDK=2.1.202
      [Host]     : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
      DefaultJob : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
    
    
    Method Mean Error StdDev
    Linq 13,332.8 us 251.782 us 466.694 us
    ByteArrayRocks 371.3 us 7.327 us 14.462 us
    BoyerMooreHorspool 108.3 us 1.153 us 1.079 us

    其他方法请看下面

    使用不安全代码的 Boyer Moore 算法 C# High Performance Boyer Moore Byte Array Search Algorithm

  • 相关阅读:
    Iconfont——实现字体图标的反转
    HTTPS——https下的静态资源中如何使用http的地址
    TP5.x——initialize()中如何return
    vscode——tab转空格
    Chocolatey——windows下的包管理器
    head里两个重要标签base和meta
    js原生触发事件
    路径分隔符不一致,导致windows下不能开发
    HTML词法和语法
    chrome headless 无头浏览器 应用
  • 原文地址:https://www.cnblogs.com/lindexi/p/12085992.html
Copyright © 2011-2022 走看看