zoukankan      html  css  js  c++  java
  • Difference between [0-9], [[:digit:]] and d

    Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).
    In most programming languages (where it is supported) d ≡ [[:digit:]] (identical).
    The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).

    There are many digits in UNICODE, for example:

    123456789 # Hindu-Arabic Arabic numerals
    ٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
    ۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
    ߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
    ०१२३४५६७८९ # DEVANAGARI

    All of which may be included in [[:digit:]] or d.

    Instead, [0-9] is generally only the ASCII digits 0123456789.


    There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

    $ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'
    
    $ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

    $ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

    $ echo "$a" | grep -oP 'p{Nd}+'
    0123456789
    ٠١٢٣٤٥٦٧٨٩
    ۰۱۲۳۴۵۶۷۸۹
    ߀߁߂߃߄߅߆߇߈߉
    ०१२३४५६७८९
    

    Change it to [0-9] to see:

    $ echo "$a" | grep -o '[0-9]+'
    0123456789
    

    POSIX

    For the specific POSIX BRE or ERE:
    The d is not supported (not in POSIX but is in GNU grep -P). [[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9][0123456789]d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.

    As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

  • 相关阅读:
    ARM的体系结构与编程系列博客——ARM体系版本
    eclipse快捷键
    ARM的体系结构与编程系列博客——ARM的历史与应用范围
    基于LINUX的多功能聊天室
    CC2530自动安全联网
    python3元组
    Python3 列表
    Python3 数字(Number)
    Python3 注释
    python3解释器
  • 原文地址:https://www.cnblogs.com/kakaisgood/p/9645277.html
Copyright © 2011-2022 走看看