zoukankan      html  css  js  c++  java
  • 正则表达式中\d和[00]有什么区别

    今天看到Stackoverflow上一个有趣的问题,为什么正则表达式在中\d比[0-0]低效?

    提问者用了如下的代码来做测试:

            static void Main(string[] args)
            {
                var rand = new Random(1234);
                var strings = new List<string>();
                //10K random strings
                for (var i = 0; i < 10000; i++)
                {
                    //Generate random string
                    var sb = new StringBuilder();
                    for (var c = 0; c < 1000; c++)
                    {
                        //Add a-z randomly
                        sb.Append((char)('a' + rand.Next(26)));
                    }
                    //In roughly 50% of them, put a digit
                    if (rand.Next(2) == 0)
                    {
                        //Replace one character with a digit, 0-9
                        sb[rand.Next(sb.Length)] = (char)('0' + rand.Next(10));
                    }
                    strings.Add(sb.ToString());
                }
    
                var baseTime = testPerfomance(strings, @"\d");
                Console.WriteLine();
                var testTime = testPerfomance(strings, "[0-9]");
                Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
                testTime = testPerfomance(strings, "[0123456789]");
                Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
            }
    
            private static TimeSpan testPerfomance(List<string> strings, string regex)
            {
                var sw = new Stopwatch();
    
                int successes = 0;
    
                var rex = new Regex(regex);
    
                sw.Start();
                foreach (var str in strings)
                {
                    if (rex.Match(str).Success)
                    {
                        successes++;
                    }
                }
                sw.Stop();
    
                Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);
    
                return sw.Elapsed;
            }
        }

    得到的输出结果是:

    Regular expression \d           took 00:00:00.2141226 result: 5077/10000
    Regular expression [0-9]        took 00:00:00.1357972 result: 5077/10000  63.42 % of first
    Regular expression [0123456789] took 00:00:00.1388997 result: 5077/10000  64.87 % of first

    从这个测试中可以看出\d比[0-9]慢了一倍。

     

    原因在于,\d会比较所有的unicode的数字,包括

    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789

    可以从这里看到更全的列表,列出了所有Unicode中属于数字的字符。

     

    如果在生成Regex的时候传入一个参数RegexOptions.ECMAScript,如下所示,那么\d就和[0-9]的效率一样了。可以从这里找到更多的Regex的选项。

    var rex = new Regex(regex, RegexOptions.ECMAScript);
  • 相关阅读:
    css自适应宽高等腰梯形
    控制台屏蔽某console的输出
    js定时器的时间最小值-setTimeout、setInterval
    03_数字的字面量
    程序员-表情包
    程序员-趣图集
    js不是从上到下执行的吗?
    CSS样式重置
    系统程序名命令表
    js手风琴图片切换实现原理及函数分析
  • 原文地址:https://www.cnblogs.com/fresky/p/3116921.html
Copyright © 2011-2022 走看看