zoukankan      html  css  js  c++  java
  • 正则表达式中\d和[00]有什么区别

    今天看到Stackoverflow上一个有趣的问题,为什么正则表达式在中\d比[0-0]低效?

    提问者用了如下的代码来做测试:

            static void Main(string[] args)
            {
                var rand = new Random(1234);
                var strings = new List<string>();
                //10K random strings
                for (var i = 0; i < 10000; i++)
                {
                    //Generate random string
                    var sb = new StringBuilder();
                    for (var c = 0; c < 1000; c++)
                    {
                        //Add a-z randomly
                        sb.Append((char)('a' + rand.Next(26)));
                    }
                    //In roughly 50% of them, put a digit
                    if (rand.Next(2) == 0)
                    {
                        //Replace one character with a digit, 0-9
                        sb[rand.Next(sb.Length)] = (char)('0' + rand.Next(10));
                    }
                    strings.Add(sb.ToString());
                }
    
                var baseTime = testPerfomance(strings, @"\d");
                Console.WriteLine();
                var testTime = testPerfomance(strings, "[0-9]");
                Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
                testTime = testPerfomance(strings, "[0123456789]");
                Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
            }
    
            private static TimeSpan testPerfomance(List<string> strings, string regex)
            {
                var sw = new Stopwatch();
    
                int successes = 0;
    
                var rex = new Regex(regex);
    
                sw.Start();
                foreach (var str in strings)
                {
                    if (rex.Match(str).Success)
                    {
                        successes++;
                    }
                }
                sw.Stop();
    
                Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);
    
                return sw.Elapsed;
            }
        }

    得到的输出结果是:

    Regular expression \d           took 00:00:00.2141226 result: 5077/10000
    Regular expression [0-9]        took 00:00:00.1357972 result: 5077/10000  63.42 % of first
    Regular expression [0123456789] took 00:00:00.1388997 result: 5077/10000  64.87 % of first

    从这个测试中可以看出\d比[0-9]慢了一倍。

     

    原因在于,\d会比较所有的unicode的数字,包括

    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789

    可以从这里看到更全的列表,列出了所有Unicode中属于数字的字符。

     

    如果在生成Regex的时候传入一个参数RegexOptions.ECMAScript,如下所示,那么\d就和[0-9]的效率一样了。可以从这里找到更多的Regex的选项。

    var rex = new Regex(regex, RegexOptions.ECMAScript);
  • 相关阅读:
    script标签加载顺序(defer & async)
    nginx反向代理vue访问时浏览器加载失败,出现 ERR_CONTENT_LENGTH_MISMATCH 问题
    Git每次进入都需要输入用户名和密码的问题解决
    update select
    sql --- where concat
    GO -- 正则表达式
    浏览器中回车(Enter)和刷新的区别是什么?[转载]
    转: Linux --- Supervisor的作用与配置
    Golang 使用Map构建Set类型的实现方法
    linux -- 查看应用启动时间
  • 原文地址:https://www.cnblogs.com/fresky/p/3116921.html
Copyright © 2011-2022 走看看