zoukankan      html  css  js  c++  java
  • Difference between [0-9], [[:digit:]] and d

    Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).
    In most programming languages (where it is supported) d ≡ [[:digit:]] (identical).
    The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).

    There are many digits in UNICODE, for example:

    123456789 # Hindu-Arabic Arabic numerals
    ٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
    ۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
    ߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
    ०१२३४५६७८९ # DEVANAGARI

    All of which may be included in [[:digit:]] or d.

    Instead, [0-9] is generally only the ASCII digits 0123456789.


    There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

    $ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'
    
    $ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

    $ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

    $ echo "$a" | grep -oP 'p{Nd}+'
    0123456789
    ٠١٢٣٤٥٦٧٨٩
    ۰۱۲۳۴۵۶۷۸۹
    ߀߁߂߃߄߅߆߇߈߉
    ०१२३४५६७८९
    

    Change it to [0-9] to see:

    $ echo "$a" | grep -o '[0-9]+'
    0123456789
    

    POSIX

    For the specific POSIX BRE or ERE:
    The d is not supported (not in POSIX but is in GNU grep -P). [[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9][0123456789]d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.

    As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

  • 相关阅读:
    A1066 Root of AVL Tree (25 分)
    A1099 Build A Binary Search Tree (30 分)
    A1043 Is It a Binary Search Tree (25 分) ——PA, 24/25, 先记录思路
    A1079; A1090; A1004:一般树遍历
    A1053 Path of Equal Weight (30 分)
    A1086 Tree Traversals Again (25 分)
    A1020 Tree Traversals (25 分)
    A1091 Acute Stroke (30 分)
    A1103 Integer Factorization (30 分)
    A1032 Sharing (25 分)
  • 原文地址:https://www.cnblogs.com/kakaisgood/p/9645277.html
Copyright © 2011-2022 走看看