zoukankan      html  css  js  c++  java
  • Difference between [0-9], [[:digit:]] and d

    Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).
    In most programming languages (where it is supported) d ≡ [[:digit:]] (identical).
    The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).

    There are many digits in UNICODE, for example:

    123456789 # Hindu-Arabic Arabic numerals
    ٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
    ۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
    ߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
    ०१२३४५६७८९ # DEVANAGARI

    All of which may be included in [[:digit:]] or d.

    Instead, [0-9] is generally only the ASCII digits 0123456789.


    There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

    $ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'
    
    $ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

    $ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

    $ echo "$a" | grep -oP 'p{Nd}+'
    0123456789
    ٠١٢٣٤٥٦٧٨٩
    ۰۱۲۳۴۵۶۷۸۹
    ߀߁߂߃߄߅߆߇߈߉
    ०१२३४५६७८९
    

    Change it to [0-9] to see:

    $ echo "$a" | grep -o '[0-9]+'
    0123456789
    

    POSIX

    For the specific POSIX BRE or ERE:
    The d is not supported (not in POSIX but is in GNU grep -P). [[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9][0123456789]d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.

    As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

  • 相关阅读:
    Android Studio中的Java控制台中出现乱码问题?
    博客第二天——头插法建立单链表
    博客志第一天——判断一个整数N是否是完全平方数?
    绝对定位篇
    JavaScript 事件循环
    var与let变量for遍历的问题
    获取url中参数值
    Js不用for,forEach,map等循环实现九九乘法表
    前端常见浏览器兼容性问题
    js常见面试题
  • 原文地址:https://www.cnblogs.com/kakaisgood/p/9645277.html
Copyright © 2011-2022 走看看