zoukankan      html  css  js  c++  java
  • Difference between [0-9], [[:digit:]] and d

    Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).
    In most programming languages (where it is supported) d ≡ [[:digit:]] (identical).
    The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).

    There are many digits in UNICODE, for example:

    123456789 # Hindu-Arabic Arabic numerals
    ٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
    ۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
    ߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
    ०१२३४५६७८९ # DEVANAGARI

    All of which may be included in [[:digit:]] or d.

    Instead, [0-9] is generally only the ASCII digits 0123456789.


    There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

    $ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'
    
    $ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

    $ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
    0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
    

    Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

    $ echo "$a" | grep -oP 'p{Nd}+'
    0123456789
    ٠١٢٣٤٥٦٧٨٩
    ۰۱۲۳۴۵۶۷۸۹
    ߀߁߂߃߄߅߆߇߈߉
    ०१२३४५६७८९
    

    Change it to [0-9] to see:

    $ echo "$a" | grep -o '[0-9]+'
    0123456789
    

    POSIX

    For the specific POSIX BRE or ERE:
    The d is not supported (not in POSIX but is in GNU grep -P). [[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9][0123456789]d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.

    As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

  • 相关阅读:
    sql行列互换
    转: 在hibernate中查询使用list,map定制返回类型
    拦截器和过滤器的区
    hibernate 实体对象的三种状态以及转换关系。
    如何理解Hibernate的延迟加载机制?在实际应用中,延迟加载与Session关闭的矛盾是如何处理的?
    Hibernate常见优化策略
    Hibernate的一级缓存、二级缓存和查询缓存。
    关于java Collections.sort 排序
    常用颜色,正则表达式工具
    java正则表达式
  • 原文地址:https://www.cnblogs.com/kakaisgood/p/9645277.html
Copyright © 2011-2022 走看看