zoukankan      html  css  js  c++  java
  • Converting a MatchCollection to string array

    Converting a MatchCollection to string array

    Ask Question

    65

    9

    Is there a better way than this to convert a MatchCollection to a string array?

    MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
    string[] strArray = new string[mc.Count];
    for (int i = 0; i < mc.Count;i++ )
    {
        strArray[i] = mc[i].Groups[0].Value;
    }
    

    P.S.: mc.CopyTo(strArray,0) throws an exception:

    At least one element in the source array could not be cast down to the destination array type.

    c# arrays regex

    shareimprove this question

    edited Aug 20 '18 at 17:40

    ggorlen

    10k41229

    asked Jul 10 '12 at 15:00

    Vil

    1,02111114

    add a comment

    5 Answers

    activeoldestvotes

    138

    Try:

    var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
        .Cast<Match>()
        .Select(m => m.Value)
        .ToArray();
    

    shareimprove this answer

    edited Mar 6 '14 at 16:49

    answered Jul 10 '12 at 15:02

    Dave Bish

    15k63757

    • 1

      I would have used OfType<Match>() for this instead of Cast<Match>() ... Then again, the outcome would be the same. – Alex Jul 10 '12 at 15:05 

    • 3

      @Alex You know that everything returned will be a Match, so there's no need to check it again at runtime.Cast makes more sense. – Servy Jul 10 '12 at 15:08

    • 2

      @DaveBish I posted some sort-of benchmarking code below, OfType<> turns out to be slightly faster. – AlexJul 10 '12 at 15:29

    • 1

      @Frontenderman - Nope, I was just aligning it with the askers question – Dave Bish Mar 6 '14 at 16:48

    • 1

      You would think it would be a simple command to turn a MatchCollection into a string[], as it is for Match.ToString(). It's pretty obvious the final type needed in a lot of Regex uses would be a string, so it should have been easy to convert. – n00dles Jun 10 '17 at 16:19

    show 6 more comments

    26

    Dave Bish's answer is good and works properly.

    It's worth noting although that replacing Cast<Match>() with OfType<Match>() will speed things up.

    Code wold become:

    var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
        .OfType<Match>()
        .Select(m => m.Groups[0].Value)
        .ToArray();
    

    Result is exactly the same (and addresses OP's issue the exact same way) but for huge strings it's faster.

    Test code:

    // put it in a console application
    static void Test()
    {
        Stopwatch sw = new Stopwatch();
        StringBuilder sb = new StringBuilder();
        string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";
    
        Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
        strText = sb.ToString();
    
        sw.Start();
        var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
                  .OfType<Match>()
                  .Select(m => m.Groups[0].Value)
                  .ToArray();
        sw.Stop();
    
        Console.WriteLine("OfType: " + sw.ElapsedMilliseconds.ToString());
        sw.Reset();
    
        sw.Start();
        var arr2 = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
                  .Cast<Match>()
                  .Select(m => m.Groups[0].Value)
                  .ToArray();
        sw.Stop();
        Console.WriteLine("Cast: " + sw.ElapsedMilliseconds.ToString());
    }
    

    Output follows:

    OfType: 6540
    Cast: 8743
    

    For very long strings Cast() is therefore slower.

    shareimprove this answer

    answered Jul 10 '12 at 15:28

    Alex

    20.4k33065

    • 1

      Very surprising! Given that OfType must do an 'is' comparison somewhere inside and a cast (I'd have thought?) Any ideas on why Cast<> is slower? I've got nothing! – Dave Bish Jul 11 '12 at 8:51

    • I honestly don't have a clue, but it "feels" right to me (OfType<> is just a filter, Cast<> is ... well, is a cast) – Alex Jul 11 '12 at 9:55

    • 2

      stackoverflow.com/questions/11430570/… – Dave Bish Jul 11 '12 at 10:27

    • More benchmarks seem to show this particular result is due to regex more than specific linq extension used – Alex Jul 11 '12 at 13:14

    add a comment

    5

    I ran the exact same benchmark that Alex has posted and found that sometimes Cast was faster and sometimes OfType was faster, but the difference between both was negligible. However, while ugly, the for loop is consistently faster than both of the other two.

    Stopwatch sw = new Stopwatch();
    StringBuilder sb = new StringBuilder();
    string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";
    Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
    strText = sb.ToString();
    
    //First two benchmarks
    
    sw.Start();
    MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
    var matches = new string[mc.Count];
    for (int i = 0; i < matches.Length; i++)
    {
        matches[i] = mc[i].ToString();
    }
    sw.Stop();
    

    Results:

    OfType: 3462
    Cast: 3499
    For: 2650
    

    shareimprove this answer

    edited Aug 5 '14 at 20:58

    answered May 14 '14 at 13:55

    David DeMar

    1,00011534

    • no surprise that linq is slower than for loop. Linq may be easier to write for some people and "increase" their productivity at the expense executing time. that can be good sometimes – gg89 Sep 23 '15 at 6:01 

    • So the original post is the most efficient method really. – n00dles Jun 10 '17 at 16:21 

    add a comment

    1

    One could also make use of this extension method to deal with the annoyance of MatchCollection not being generic. Not that it's a big deal, but this is almost certainly more performant than OfType or Cast, because it's just enumerating, which both of those also have to do.

    (Side note: I wonder if it would be possible for the .NET team to make MatchCollection inherit generic versions of ICollection and IEnumerable in the future? Then we wouldn't need this extra step to immediately have LINQ transforms available).

    public static IEnumerable<Match> ToEnumerable(this MatchCollection mc)
    {
        if (mc != null) {
            foreach (Match m in mc)
                yield return m;
        }
    }
    

    shareimprove this answer

    edited Jan 9 at 0:01

    Lauren Van Sloun

    1,03531219

    answered Feb 14 '18 at 18:23

    Nicholas Petersen

    4,66963660

    add a comment

    0

    Consider the Following Code...

    var emailAddress = "joe@sad.com; joe@happy.com; joe@elated.com";
    List<string> emails = new List<string>();
    emails = Regex.Matches(emailAddress, @"([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})")
                    .Cast<Match>()
                    .Select(m => m.Groups[0].Value)
                    .ToList();
    

    Good Luck!

    shareimprove this answer

    answered Nov 22 '13 at 2:01

    gpmurthy

    2,1631421

    • 1

      ugh... That regex is horrendous to look at. BTW, as there doesn't exist a foolproof regex for validating emails, use the MailAddress object. stackoverflow.com/a/201378/2437521 – C. Tewalt Aug 5 '14 at 18:19

    add a comment

  • 相关阅读:
    (算法)堆与堆排序
    (笔试题)N!的三进制数尾部0的个数
    (笔试题)N!尾部连续0的个数
    程序员与编程一些鲜为人知的事实 软件遵循熵的定律
    关注职业发展,也要关注健康 别让老板杀了你
    其实,最好的年龄才刚刚开始
    你为什么就做不到呢? 正确的选择改变命运
    睡觉前为啥总是忍不住刷网页刷手机?
    中年程序员职业生涯另一选择:创业追寻自己的热情所在
    都很好的两人为何却没有好的婚姻 清楚对方的需要
  • 原文地址:https://www.cnblogs.com/grj001/p/12225112.html
Copyright © 2011-2022 走看看