zoukankan      html  css  js  c++  java
  • libc 之 locales

     

    1 Locales

    软件的国际化,意味着使软件符合用户的习惯。 ISO C 中,通过 locale 来实现这一目的。

    每一台机器可以支持多个 locales , 用户可以通过环境变量来设置程序将要使用的 locale.

    1.1 Locale 的作用

    每个 locale 均由若干为不同目的而定义的规范构成。 这些规范包括:

    • 什么样的宽字符序列是合法的,以及如何来解释他们。
    • 如何对字符进行分类。
    • 本地语言和字符的对照表。
    • 如何格式化数字的显示。
    • 输出以及错误提示使用何种语言。
    • 使用何种语言来回答 yes-or-no questions。
    • 使用何种语言来应对复杂的用户输入。

    1.2 Locale 的选择

    选择 (设置) Locale 的最简方法是设置环境变量: LANG , 该方法将会选择这个 locale 的所有规范。例如:

    [yyc@localhost ~]$ locale
    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    

    同时,我们也可以单独设置一个 locale 中的某个单独的规范, 例如早期的 fcitx (Linux 下的中文输入法), 要求 LC_CTYPE 必须为 GB2312 , 则可以进行如下设置:

    [yyc@localhost ~]$ export LC_CTYPE="zh_CN.GB2312"
    [yyc@localhost ~]$ locale
    LANG=en_US.UTF-8
    LC_CTYPE=zh_CN.GB2312
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    
    

    一个系统不一定支持所有的 locales , 但所有的系统都需要支持一个标准的 Locale —— "C" 或者 "POSIX" 。

    1.3 Locales 影响到的 Activities 的类别

    locale 定义的规范可以分为若干类别,这些类别如下, 其中,每个类别的名字既可以作为环境变量名而在环境变量中找到, 也可以作为宏名在函数 setlocale 中作为参数。

    • LC_COLLATE

      影响字符串的校对。

    • LC_TYPE

      影响字符的分类,以及将字符转换成多字节和宽字符。

    • LC_MONETARY

      影响估计货币的格式化输出。

    • LC_NUMERIC

      影响数字的格式化输出。

    • LC_TIME

      影响日期和时间的格式化输出。

    • LC_MESSAGES

      影响用户接口中消息中使用的语言及用于匹配 yes-or-no questions 答案的正则表达式。

    • LC_ALL

      该符号并非环境变量,用在 setlocale() 中,用于设置上述所有的类别。

    • LANG

      如果设置了该环境变量,则该环境变量的值会影响上述所有的类别, 除非用户又显示地、重新设置了上述类别中的某一个。

    1.4 Locale 的设置

    由 C Family 编写的应用程序启动时可以自动继承通过环境变量设置的 locale , 但这种继承仅限于应用程序本身,对应用程序所使用的库不起作用 —— 这些库提供的函数将默认使用标准库中的 C Locale 。

    我们可以通过 setlocale() 来通知库函数使用由环境变量指定的 locale:

    setlocale(LC_ALL, "");
    

    setlocale() 还可以用来指定 locale 中的某个单独的规范:

    char * setlocale (int CATEGORY, const char *LOCALE);
    

    该函数用于将当前 Locale 中的 CATEGORY 设置为 LOCALE 。

    • 如果 *LOCALE 为 NULL, 则返回当前使用的 LOCALE;
    • 如果 *LOCALE 不为 NULL且合法, 则返回当设置成功后使用的 LOCALE;
    • 如果 *LOCALE 不为 NULL且不合法, 则当前 locale 不变,函数返回 NULL。

    1.5 标准 Locales

    前面提到,并非所有的系统都支持所有的 locales , 但是所有的系统都必须支持若干标准的 locales, 这些标准 Locales 包括:

    • C:
      由标准 C 指定的 locale , 其属性和行为均符合 ISO C 标准。
    • POSIX:
      POSIX locale,Linux 下的 POSIX locale 当前与 C 完全一样。
    • ""
      空 locale ,使用该 locale 的程序会自动使用环境变量中规定的 locale 。

      locales 的定义和安装通常是由系统管理员完成的。

    1.6 Locale 信息的获取

    有多种方式可以用于获取 locale 信息, 其中最简单的方法是让 C library 自己去获取, 很多 Library 都可以这样去做。 以 strftime() 为例,同样的代码,在不同的 locale 下,输出会随 locale 而变。

    但 有时程序无法自动完成 locale 信息的获取, 此时我们足要自己去做。 用来完成这个目的的函数有两个 localeconv() 和 nl_langinfo() 。 其中,前者是 标准C 提供的,可移植性好,但借口超烂。后者是 Unix 接口, 只要系统遵循 Unix 标准,就可以使用。

    1.6.1 蹩脚的 localeconv

    localeconv() 同 setlocale() 一样,是由标准 C 提供的,可移植, 但使用代价昂贵,可拓展性差。并且,它接提供了访问 locale 中的 LC_MONETARY 和 LC_NUMERIC , 通用性差。

    localeconv() 原型为:

    struct lconv * localeconv (void);
    

    该函数返回一个 lconv 结构的指针, lconv 结构中的元素包含了如何在当前 locale 中格式化输出数字和货币的一些信息。 Glibc 中,其定义如下:

    /* Structure giving information about numeric and monetary notation.  */
    struct lconv
    {
      /* Numeric (non-monetary) information.  */
    
      char *decimal_point;      /* Decimal point character.  */
      char *thousands_sep;      /* Thousands separator.  */
      /* Each element is the number of digits in each group;
         elements with higher indices are farther left.
         An element with value CHAR_MAX means that no further grouping is done.
         An element with value 0 means that the previous element is used
         for all groups farther left.  */
      char *grouping;
    
      /* Monetary information.  */
    
      /* First three chars are a currency symbol from ISO 4217.
         Fourth char is the separator.  Fifth char is '\0'.  */
      char *int_curr_symbol;
      char *currency_symbol;    /* Local currency symbol.  */
      char *mon_decimal_point;  /* Decimal point character.  */
      char *mon_thousands_sep;  /* Thousands separator.  */
      char *mon_grouping;       /* Like `grouping' element (above).  */
      char *positive_sign;      /* Sign for positive values.  */
      char *negative_sign;      /* Sign for negative values.  */
      char int_frac_digits;     /* Int'l fractional digits.  */
      char frac_digits;     /* Local fractional digits.  */
      /* 1 if currency_symbol precedes a positive value, 0 if succeeds.  */
      char p_cs_precedes;
      /* 1 iff a space separates currency_symbol from a positive value.  */
      char p_sep_by_space;
      /* 1 if currency_symbol precedes a negative value, 0 if succeeds.  */
      char n_cs_precedes;
      /* 1 iff a space separates currency_symbol from a negative value.  */
      char n_sep_by_space;
      /* Positive and negative sign positions:
         0 Parentheses surround the quantity and currency_symbol.
         1 The sign string precedes the quantity and currency_symbol.
         2 The sign string follows the quantity and currency_symbol.
         3 The sign string immediately precedes the currency_symbol.
         4 The sign string immediately follows the currency_symbol.  */
      char p_sign_posn;
      char n_sign_posn;
    #ifdef __USE_ISOC99
      /* 1 if int_curr_symbol precedes a positive value, 0 if succeeds.  */
      char int_p_cs_precedes;
      /* 1 iff a space separates int_curr_symbol from a positive value.  */
      char int_p_sep_by_space;
      /* 1 if int_curr_symbol precedes a negative value, 0 if succeeds.  */
      char int_n_cs_precedes;
      /* 1 iff a space separates int_curr_symbol from a negative value.  */
      char int_n_sep_by_space;
      /* Positive and negative sign positions:
         0 Parentheses surround the quantity and int_curr_symbol.
         1 The sign string precedes the quantity and int_curr_symbol.
         2 The sign string follows the quantity and int_curr_symbol.
         3 The sign string immediately precedes the int_curr_symbol.
         4 The sign string immediately follows the int_curr_symbol.  */
      char int_p_sign_posn;
      char int_n_sign_posn;
    #else
      char __int_p_cs_precedes;
      char __int_p_sep_by_space;
      char __int_n_cs_precedes;
      char __int_n_sep_by_space;
      char __int_p_sign_posn;
      char __int_n_sign_posn;
    #endif
    };
    

    具体含义,参考其中注释。

    1.6.2 优雅、迅捷的 nl_langinfo

    char *nl_langinfo(ln_item ITEM);
    

    nl_langinfo() 用于访问 locale 中的细节,粒度细,速度快。 其中, ITEM 定义在头文件 langinfo.h 中,解释如下:

    `CODESET'
          `nl_langinfo' returns a string with the name of the coded
          character set used in the selected locale.
    
    `ABDAY_1'
    `ABDAY_2'
    `ABDAY_3'
    `ABDAY_4'
    `ABDAY_5'
    `ABDAY_6'
    `ABDAY_7'
          `nl_langinfo' returns the abbreviated weekday name.  `ABDAY_1'
          corresponds to Sunday.
    
    `DAY_1'
    `DAY_2'
    `DAY_3'
    `DAY_4'
    `DAY_5'
    `DAY_6'
    `DAY_7'
          Similar to `ABDAY_1' etc., but here the return value is the
          unabbreviated weekday name.
    
    `ABMON_1'
    `ABMON_2'
    `ABMON_3'
    `ABMON_4'
    `ABMON_5'
    `ABMON_6'
    `ABMON_7'
    `ABMON_8'
    `ABMON_9'
    `ABMON_10'
    `ABMON_11'
    `ABMON_12'
          The return value is abbreviated name of the month.  `ABMON_1'
          corresponds to January.
    
    `MON_1'
    `MON_2'
    `MON_3'
    `MON_4'
    `MON_5'
    `MON_6'
    `MON_7'
    `MON_8'
    `MON_9'
    `MON_10'
    `MON_11'
    `MON_12'
          Similar to `ABMON_1' etc., but here the month names are not
          abbreviated.  Here the first value `MON_1' also corresponds
          to January.
    
    `AM_STR'
    `PM_STR'
          The return values are strings which can be used in the
          representation of time as an hour from 1 to 12 plus an am/pm
          specifier.
    
          Note that in locales which do not use this time representation
          these strings might be empty, in which case the am/pm format
          cannot be used at all.
    
    `D_T_FMT'
          The return value can be used as a format string for
          `strftime' to represent time and date in a locale-specific
          way.
    
    `D_FMT'
          The return value can be used as a format string for
          `strftime' to represent a date in a locale-specific way.
    
    `T_FMT'
          The return value can be used as a format string for
          `strftime' to represent time in a locale-specific way.
    
    `T_FMT_AMPM'
          The return value can be used as a format string for
          `strftime' to represent time in the am/pm format.
    
          Note that if the am/pm format does not make any sense for the
          selected locale, the return value might be the same as the
          one for `T_FMT'.
    
    `ERA'
          The return value represents the era used in the current
          locale.
    
          Most locales do not define this value.  An example of a
          locale which does define this value is the Japanese one.  In
          Japan, the traditional representation of dates includes the
          name of the era corresponding to the then-emperor's reign.
    
          Normally it should not be necessary to use this value
          directly.  Specifying the `E' modifier in their format
          strings causes the `strftime' functions to use this
          information.  The format of the returned string is not
          specified, and therefore you should not assume knowledge of
          it on different systems.
    
    `ERA_YEAR'
          The return value gives the year in the relevant era of the
          locale.  As for `ERA' it should not be necessary to use this
          value directly.
    
    `ERA_D_T_FMT'
          This return value can be used as a format string for
          `strftime' to represent dates and times in a locale-specific
          era-based way.
    
    `ERA_D_FMT'
          This return value can be used as a format string for
          `strftime' to represent a date in a locale-specific era-based
          way.
    
    `ERA_T_FMT'
          This return value can be used as a format string for
          `strftime' to represent time in a locale-specific era-based
          way.
    
    `ALT_DIGITS'
          The return value is a representation of up to 100 values used
          to represent the values 0 to 99.  As for `ERA' this value is
          not intended to be used directly, but instead indirectly
          through the `strftime' function.  When the modifier `O' is
          used in a format which would otherwise use numerals to
          represent hours, minutes, seconds, weekdays, months, or
          weeks, the appropriate value for the locale is used instead.
    
    `INT_CURR_SYMBOL'
          The same as the value returned by `localeconv' in the
          `int_curr_symbol' element of the `struct lconv'.
    
    `CURRENCY_SYMBOL'
    `CRNCYSTR'
          The same as the value returned by `localeconv' in the
          `currency_symbol' element of the `struct lconv'.
    
          `CRNCYSTR' is a deprecated alias still required by Unix98.
    
    `MON_DECIMAL_POINT'
          The same as the value returned by `localeconv' in the
          `mon_decimal_point' element of the `struct lconv'.
    
    `MON_THOUSANDS_SEP'
          The same as the value returned by `localeconv' in the
          `mon_thousands_sep' element of the `struct lconv'.
    
    `MON_GROUPING'
          The same as the value returned by `localeconv' in the
          `mon_grouping' element of the `struct lconv'.
    
    `POSITIVE_SIGN'
          The same as the value returned by `localeconv' in the
          `positive_sign' element of the `struct lconv'.
    
    `NEGATIVE_SIGN'
          The same as the value returned by `localeconv' in the
          `negative_sign' element of the `struct lconv'.
    
    `INT_FRAC_DIGITS'
          The same as the value returned by `localeconv' in the
          `int_frac_digits' element of the `struct lconv'.
    
    `FRAC_DIGITS'
          The same as the value returned by `localeconv' in the
          `frac_digits' element of the `struct lconv'.
    
    `P_CS_PRECEDES'
          The same as the value returned by `localeconv' in the
          `p_cs_precedes' element of the `struct lconv'.
    
    `P_SEP_BY_SPACE'
          The same as the value returned by `localeconv' in the
          `p_sep_by_space' element of the `struct lconv'.
    
    `N_CS_PRECEDES'
          The same as the value returned by `localeconv' in the
          `n_cs_precedes' element of the `struct lconv'.
    
    `N_SEP_BY_SPACE'
          The same as the value returned by `localeconv' in the
          `n_sep_by_space' element of the `struct lconv'.
    
    `P_SIGN_POSN'
          The same as the value returned by `localeconv' in the
          `p_sign_posn' element of the `struct lconv'.
    
    `N_SIGN_POSN'
          The same as the value returned by `localeconv' in the
          `n_sign_posn' element of the `struct lconv'.
    
    `INT_P_CS_PRECEDES'
          The same as the value returned by `localeconv' in the
          `int_p_cs_precedes' element of the `struct lconv'.
    
    `INT_P_SEP_BY_SPACE'
          The same as the value returned by `localeconv' in the
          `int_p_sep_by_space' element of the `struct lconv'.
    
    `INT_N_CS_PRECEDES'
          The same as the value returned by `localeconv' in the
          `int_n_cs_precedes' element of the `struct lconv'.
    
    `INT_N_SEP_BY_SPACE'
          The same as the value returned by `localeconv' in the
          `int_n_sep_by_space' element of the `struct lconv'.
    
    `INT_P_SIGN_POSN'
          The same as the value returned by `localeconv' in the
          `int_p_sign_posn' element of the `struct lconv'.
    
    `INT_N_SIGN_POSN'
          The same as the value returned by `localeconv' in the
          `int_n_sign_posn' element of the `struct lconv'.
    
    `DECIMAL_POINT'
    `RADIXCHAR'
          The same as the value returned by `localeconv' in the
          `decimal_point' element of the `struct lconv'.
    
          The name `RADIXCHAR' is a deprecated alias still used in
          Unix98.
    
    `THOUSANDS_SEP'
    `THOUSEP'
          The same as the value returned by `localeconv' in the
          `thousands_sep' element of the `struct lconv'.
    
          The name `THOUSEP' is a deprecated alias still used in Unix98.
    
    `GROUPING'
          The same as the value returned by `localeconv' in the
          `grouping' element of the `struct lconv'.
    
    `YESEXPR'
          The return value is a regular expression which can be used
          with the `regex' function to recognize a positive response to
          a yes/no question.  The GNU C library provides the `rpmatch'
          function for easier handling in applications.
    
    `NOEXPR'
          The return value is a regular expression which can be used
          with the `regex' function to recognize a negative response to
          a yes/no question.
    
    `YESSTR'
          The return value is a locale-specific translation of the
          positive response to a yes/no question.
    
          Using this value is deprecated since it is a very special
          case of message translation, and is better handled by the
          message translation functions (*note Message Translation::).
    
          The use of this symbol is deprecated.  Instead message
          translation should be used.
    
    `NOSTR'
          The return value is a locale-specific translation of the
          negative response to a yes/no question.  What is said for
          `YESSTR' is also true here.
    
          The use of this symbol is deprecated.  Instead message
          translation should be used.
    
    
  • 相关阅读:
    国际关注,Panda 交易所获悉美银监机构批准特许银行托管加密资产
    Panda 交易所快报 央行数字货币测试进入C端流量入口
    Panda交易所获悉,五地股权市场获批参与「区块链建设试点」
    K2“拍了拍”你,这里有你想要的医药行业整体解决方案—K2 BPM
    K2 BPM 给你不一样的产品体验,有兴趣了解一下吗?
    BPM产品解读之规则设计器-K2 BPM-工作流引擎
    idea 使用Springboot 编译报错
    vue 表格中的下拉框单选、多选处理
    Kibana的安装和使用
    .net core 如何向elasticsearch中创建索引,插入数据。
  • 原文地址:https://www.cnblogs.com/yangyingchao/p/2234297.html
Copyright © 2011-2022 走看看