zoukankan      html  css  js  c++  java
  • Unicode Locale

    What Is a Locale?

    A key concept for application programs is that of a program's locale. The locale is an explicit model and definition of a native-language environment. The notion of a locale is explicitly defined and included in the POSIX standard which can be accessed through http://opengroup.org.

    A locale consists of a number of categories for which country-dependent formatting or other specifications exist. A program's locale defines its code sets, date and time formatting conventions, monetary conventions, decimal formatting conventions, and collation (sort) order.

    A locale name can be composed of a base language, country (territory) of use, and codeset. For example, German language is de, an abbreviation for Deutsch, while Swiss German is de_CHCH being an abbreviation for Confederation Helvetica. This convention allows for specific differences by country, such as currency unit notation. In Oracle Solaris 11 the default locale codeset is UTF-8, an ASCII compatible 8-bit encoding form of Unicode. The fully defined locale name for Swiss German would thus be de_CH.UTF-8.

    More than one locale can be associated with a particular language, which allows for regional differences. For example, an English-speaking user in the United States can select the en_US.UTF-8 locale (English for the United States), while an English-speaking user in Great Britain can select en_GB.UTF-8 (English for Great Britain).

    Generally the locale name is specified by the LANG environment variable. Locale categories are subordinate to LANG but can be set separately, in which case they override LANG. If the LC_ALL environment variable is set, it overrides LANG and all the separate locale categories.

    The locale naming convention is:

    language[_territory][.codeset][@modifier]

    where a two-letter language code is from ISO 639, a two-letter territory code is from ISO 3166, codeset is the name of the codeset that is being used in the locale, and modifier is the name of the characteristics that differentiate the locale from the locale without the modifier.

    All Oracle Solaris product locales preserve the Portable Character Set characters with US-ASCII code values.

    For more information about the portable character set, refer to X/Open CAE Specification: System Interface Definitions, Issue 5" (ISBN 1-85912-186-1).

    A single locale can have more than one locale name. For example, POSIX is the same locale as C.

    C Locale

    The C locale, also known as the POSIX locale, is the POSIX system default locale for all POSIX-compliant systems. The Oracle Solaris operating system is a POSIX system. The Single UNIX Specification, Version 3, defines the C locale. You can register at http://www.unix.org/version3/online.html to read and download the specification.

    You can specify your internationalized programs to run in the C locale in the following two ways:

    • Unset all locale environment variables. Runs the application in the C locale.

      $ unset LC_ALL LANG LC_CTYPE LC_COLLATE LC_NUMERIC LC_TIME LC_MONETARY LC_MESSAGES
    • Explicitly set the locale to C or POSIX.

      $ export LC_ALL=C
      $ export LANG=C

      Some applications check the LANG environment variables without actually calling setlocale(3C) to reference the current locale. In this case, shell is explicitly set to the C locale by specifying the LC_ALL and LANG locale environment variables. For the precedence relationship among locale environment variables, see the setlocale(3C) man page.

    To check the current locale settings in a terminal environment, run the locale(1) command.

    $ locale
    LANG=C
    LC_CTYPE="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_COLLATE="C"
    LC_MONETARY="C"
    LC_MESSAGES="C"
    LC_ALL=

    Locale Categories

    The types of locale categories are as follows:

    LC_CTYPE

    Character classification and case conversion.

    LC_TIME

    Specifies date and time formats, including month names, days of the week, and common full and abbreviated representations.

    LC_MONETARY

    Specifies monetary formats, including the currency symbol for the locale, thousands separator, sign position, the number of fractional digits, and so forth.

    LC_NUMERIC

    Specifies the decimal delimiter (or radix character), the thousands separator, and the grouping.

    LC_COLLATE

    Specifies a collation order and regular expression definition for the locale.

    LC_MESSAGES

    Specifies the language in which the localized messages are written, and affirmative and negative responses of the locale (yes and no strings and expressions).

    LO_LTYPE

    Specifies the layout engine that provides information about language rendering. Language rendering (or text rendering) depends on the shape and direction attributes of a script.

    Core Locales

    The following table lists Oracle Solaris 11 core locales:

    Table 1-1 Languages and Core locales

    Language
    Core locale
    Chinese - Simplified
    zh_CN.UTF-8
    Chinese - Traditional
    zh_TW.UTF-8
    English
    en_US.UTF-8
    French
    fr_FR.UTF-8
    German
    de_DE.UTF-8
    Italian
    it_IT.UTF-8
    Japanese
    ja_JP.UTF-8
    Korean
    ko_KR.UTF-8
    Portuguese - Brazilian
    pt_BR.UTF-8
    Spanish
    es_ES.UTF-8

    Core locales have better coverage at the level of localized messages than the locales available for additional installation. Oracle Solaris OS components such as Installer or Package Manager are localized only in core locales while localized messages for third-party software such as GNOME or Firefox are often available in more locales.

    All locales in the Oracle Solaris environment are capable of displaying localized messages, provided that the localized messages for the relevant language and application are present. Additional locales including all their available localized messages can be added to the system from the installation repository by modification of pkg facet properties. For more information, see Installing Additional Locales.

    ISO-3166 Country Codes and ISO-639 Language Codes

    This chapter contains the tables which provide the list of ISO Codes. Table 20-1 provides the list of the ISO-3166 Country Codes and Table 20-2 lists the ISO-639 Language Codes


    ISO-3166 Country Codes

    Table 20-1  ISO-3166 Country Codes
    Country
    ISO-3166 Country Code
    AFGHANISTAN
    AF
    ALBANIA
    AL
    ALGERIA
    DZ
    AMERICAN SAMOA
    AS
    ANDORRA
    AD
    ANGOLA
    AO
    ANTARCTICA
    AQ
    ANTIGUA AND BARBUDA
    AG
    ARGENTINA
    AR
    ARMENIA
    AM
    ARUBA
    AW
    AUSTRALIA
    AU
    AUSTRIA
    AT
    AZERBAIJAN
    AZ
    BAHAMAS
    BS
    BAHRAIN
    BH
    BANGLADESH
    BD
    BARBADOS
    BB
    BELARUS
    BY
    BELGIUM
    BE
    BELIZE
    BZ
    BENIN
    BJ
    BERMUDA
    BM
    BHUTAN
    BT
    BOLIVIA
    BO
    BOSNIA AND HERZEGOVINA
    BA
    BOTSWANA
    BW
    BOUVET ISLAND
    BV
    BRAZIL
    BR
    BRITISH INDIAN OCEAN TERRITORY
    IO
    BRUNEI DARUSSALAM
    BN
    BULGARIA
    BG
    BURKINA FASO
    BF
    BURUNDI
    BI
    CAMBODIA
    KH
    CAMEROON
    CM
    CANADA
    CA
    CAPE VERDE
    CV
    CAYMAN ISLANDS
    KY
    CENTRAL AFRICAN REPUBLIC
    CF
    CHAD
    TD
    CHILE
    CL
    CHINA
    CN
    CHRISTMAS ISLAND
    CX
    COCOS (KEELING) ISLANDS
    CC
    COLOMBIA
    CO
    COMOROS
    KM
    CONGO
    CG
    CONGO, THE DEMOCRATIC REPUBLIC OF THE
    CD
    COOK ISLANDS
    CK
    COSTA RICA
    CR
    CÔTE D'IVOIRE
    CI
    CROATIA
    HR
    CUBA
    CU
    CYPRUS
    CY
    CZECH REPUBLIC
    CZ
    DENMARK
    DK
    DJIBOUTI
    DJ
    DOMINICA
    DM
    DOMINICAN REPUBLIC
    DO
    ECUADOR
    EC
    EGYPT
    EG
    EL SALVADOR
    SV
    EQUATORIAL GUINEA
    GQ
    ERITREA
    ER
    ESTONIA
    EE
    ETHIOPIA
    ET
    FALKLAND ISLANDS (MALVINAS)
    FK
    FAROE ISLANDS
    FO
    FIJI
    FJ
    FINLAND
    FI
    FRANCE
    FR
    FRENCH GUIANA
    GF
    FRENCH POLYNESIA
    PF
    FRENCH SOUTHERN TERRITORIES
    TF
    GABON
    GA
    GAMBIA
    GM
    GEORGIA
    GE
    GERMANY
    DE
    GHANA
    GH
    GIBRALTAR
    GI
    GREECE
    GR
    GREENLAND
    GL
    GRENADA
    GD
    GUADELOUPE
    GP
    GUAM
    GU
    GUATEMALA
    GT
    GUINEA
    GN
    GUINEA-BISSAU
    GW
    GUYANA
    GY
    HAITI
    HT
    HEARD ISLAND AND MCDONALD ISLANDS
    HM
    HONDURAS
    HN
    HONG KONG
    HK
    HUNGARY
    HU
    ICELAND
    IS
    INDIA
    IN
    INDONESIA
    ID
    IRAN, ISLAMIC REPUBLIC OF
    IR
    IRAQ
    IQ
    IRELAND
    IE
    ISRAEL
    IL
    ITALY
    IT
    JAMAICA
    JM
    JAPAN
    JP
    JORDAN
    JO
    KAZAKHSTAN
    KZ
    KENYA
    KE
    KIRIBATI
    KI
    KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF
    KP
    KOREA, REPUBLIC OF
    KR
    KUWAIT
    KW
    KYRGYZSTAN
    KG
    LAO PEOPLE'S DEMOCRATIC REPUBLIC
    LA
    LATVIA
    LV
    LEBANON
    LB
    LESOTHO
    LS
    LIBERIA
    LR
    LIBYAN ARAB JAMAHIRIYA
    LY
    LIECHTENSTEIN
    LI
    LITHUANIA
    LT
    LUXEMBOURG
    LU
    MACAO
    MO
    MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF
    MK
    MADAGASCAR
    MG
    MALAWI
    MW
    MALAYSIA
    MY
    MALDIVES
    MV
    MALI
    ML
    MALTA
    MT
    MARSHALL ISLANDS
    MH
    MARTINIQUE
    MQ
    MAURITANIA
    MR
    MAURITIUS
    MU
    MAYOTTE
    YT
    MEXICO
    MX
    MICRONESIA, FEDERATED STATES OF
    FM
    MOLDOVA, REPUBLIC OF
    MD
    MONACO
    MD
    MONGOLIA
    MN
    MONTSERRAT
    MS
    MOROCCO
    MA
    MOZAMBIQUE
    MZ
    MYANMAR
    MM
    NAMIBIA
    NA
    NAURU
    NR
    NEPAL
    NP
    NETHERLANDS
    NL
    NETHERLANDS ANTILLES
    AN
    NEW CALEDONIA
    NC
    NEW ZEALAND
    NZ
    NICARAGUA
    NI
    NIGER
    NE
    NIGERIA
    NG
    NIUE
    NU
    NORFOLK ISLAND
    NF
    NORTHERN MARIANA ISLANDS
    MP
    NORWAY
    NO
    OMAN
    OM
    PAKISTAN
    PK
    PALAU
    PW
    PALESTINIAN TERRITORY, OCCUPIED
    PS
    PANAMA
    PA
    PAPUA NEW GUINEA
    PG
    PARAGUAY
    PY
    PERU
    PE
    PHILIPPINES
    PH
    PITCAIRN
    PN
    POLAND
    PL
    PUERTO RICO
    PR
    QATAR
    QA
    RÉUNION
    RE
    ROMANIA
    RO
    RUSSIAN FEDERATION
    RU
    RWANDA
    RW
    SAINT HELENA
    SH
    SAINT KITTS AND NEVIS
    KN
    SAINT LUCIA
    LC
    SAINT PIERRE AND MIQUELON
    PM
    SAINT VINCENT AND THE GRENADINES
    VC
    SAMOA
    WS
    SAN MARINO
    SM
    SAO TOME AND PRINCIPE
    ST
    SAUDI ARABIA
    SA
    SENEGAL
    SN
    SERBIA AND MONTENEGRO
    CS
    SEYCHELLES
    SC
    SIERRA LEONE
    SL
    SINGAPORE
    SG
    SLOVAKIA
    SK
    SLOVENIA
    SI
    SOLOMON ISLANDS
    SB
    SOMALIA
    SO
    SOUTH AFRICA
    ZA
    SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS
    GS
    SPAIN
    ES
    SRI LANKA
    LK
    SUDAN
    SD
    SURINAME
    SR
    SVALBARD AND JAN MAYEN
    SJ
    SWAZILAND
    SZ
    SWEDEN
    SE
    SWITZERLAND
    CH
    SYRIAN ARAB REPUBLIC
    SY
    TAIWAN, PROVINCE OF CHINA
    TW
    TAJIKISTAN
    TJ
    TANZANIA, UNITED REPUBLIC OF
    TZ
    THAILAND
    TH
    TIMOR-LESTE
    TL
    TOGO
    TG
    TOKELAU
    TK
    TONGA
    TO
    TRINIDAD AND TOBAGO
    TT
    TUNISIA
    TN
    TURKEY
    TR
    TURKMENISTAN
    TM
    TURKS AND CAICOS ISLANDS
    TC
    TUVALU
    TV
    UGANDA
    UG
    UKRAINE
    UA
    UNITED ARAB EMIRATES
    AE
    UNITED KINGDOM
    GB
    UNITED STATES
    US
    UNITED STATES MINOR OUTLYING ISLANDS
    UM
    URUGUAY
    UY
    UZBEKISTAN
    UZ
    VANUATU
    VU
    VENEZUELA
    VE
    VIET NAM
    VN
    VIRGIN ISLANDS, BRITISH
    VG
    VIRGIN ISLANDS, U.S.
    VI
    WALLIS AND FUTUNA
    WF
    WESTERN SAHARA
    EH
    YEMEN
    YE
    ZAMBIA
    ZM
    ZIMBABWE
    ZW

     

     ISO-639 Language Codes
    Table 20-2 ISO-639 Language Codes
    Language
    ISO-639 Language Code
    Abkhazian
    ab
    Afar
    aa
    Afrikaans
    af
    Albanian
    sq
    Amharic
    am
    Arabic
    ar
    Armenian
    hy
    Assamese
    as
    Aymara
    ay
    Azerbaijani
    az
    Bashkir
    ba
    Basque
    eu
    Bengali (Bangla)
    bn
    Bhutani
    dz
    Bihari
    bh
    Bislama
    bi
    Breton
    br
    Bulgarian
    bg
    Burmese
    my
    Byelorussian (Belarusian)
    be
    Cambodian
    km
    Catalan
    ca
    Chinese (Simplified)
    zh
    Chinese (Traditional)
    zh
    Corsican
    co
    Croatian
    hr
    Czech
    cs
    Danish
    da
    Dutch
    nl
    English
    en
    Esperanto
    eo
    Estonian
    et
    Faeroese
    fo
    Farsi
    fa
    Fiji
    fj
    Finnish
    fi
    French
    fr
    Frisian
    fy
    Galician
    gl
    Gaelic (Scottish)
    gd
    Gaelic (Manx)
    gv
    Georgian
    ka
    German
    de
    Greek
    el
    Greenlandic
    kl
    Guarani
    gn
    Gujarati
    gu
    Hausa
    ha
    Hebrew
    he
    Hindi
    hi
    Hungarian
    hu
    Icelandic
    is
    Indonesian
    id
    Interlingua
    ia
    Interlingue
    ie
    Inuktitut
    iu
    Inupiak
    ik
    Irish
    ga
    Italian
    it
    Japanese
    ja
    Javanese
    ja
    Kannada
    kn
    Kashmiri
    ks
    Kazakh
    kk
    Kinyarwanda (Ruanda)
    rw
    Kirghiz
    ky
    Kirundi (Rundi)
    rn
    Korean
    ko
    Kurdish
    ku
    Laothian
    lo
    Latin
    la
    Latvian (Lettish)
    lv
    Limburgish ( Limburger)
    li
    Lingala
    ln
    Lithuanian
    lt
    Macedonian
    mk
    Malagasy
    mg
    Malay
    ms
    Malayalam
    ml
    Maltese
    mt
    Maori
    mi
    Marathi
    mr
    Moldavian
    mo
    Mongolian
    mn
    Nauru
    na
    Nepali
    ne
    Norwegian
    no
    Occitan
    oc
    Oriya
    or
    Oromo (Afan, Galla)
    om
    Pashto (Pushto)
    ps
    Polish
    pl
    Portuguese
    pt
    Punjabi
    pa
    Quechua
    qu
    Rhaeto-Romance
    rm
    Romanian
    ro
    Russian
    ru
    Samoan
    sm
    Sangro
    sg
    Sanskrit
    sa
    Serbian
    sr
    Serbo-Croatian
    sh
    Sesotho
    st
    Setswana
    tn
    Shona
    sn
    Sindhi
    sd
    Sinhalese
    si
    Siswati
    ss
    Slovak
    sk
    Slovenian
    sl
    Somali
    so
    Spanish
    es
    Sundanese
    su
    Swahili (Kiswahili)
    sw
    Swedish
    sv
    Tagalog
    tl
    Tajik
    tg
    Tamil
    ta
    Tatar
    tt
    Telugu
    te
    Thai
    th
    Tibetan
    bo
    Tigrinya
    ti
    Tonga
    to
    Tsonga
    ts
    Turkish
    tr
    Turkmen
    tk
    Twi
    tw
    Uighur
    ug
    Ukrainian
    uk
    Urdu
    ur
    Uzbek
    uz
    Vietnamese
    vi
    Volapük
    vo
    Welsh
    cy
    Wolof
    wo
    Xhosa
    xh
    Yiddish
    yi
    Yoruba
    yo
    Zulu
    zu

     

    Related Topics

    An important note for developers of UTF-8 decoding routines: For security reasons, a UTF-8 decoder must not accept UTF-8 sequences that are longer than necessary to encode a character. For example, the character U+000A (line feed) must be accepted from a UTF-8 stream only in the form 0x0A, but not in any of the following five possible overlong forms:

      0xC0 0x8A
      0xE0 0x80 0x8A
      0xF0 0x80 0x80 0x8A
      0xF8 0x80 0x80 0x80 0x8A
      0xFC 0x80 0x80 0x80 0x80 0x8A
    

    Any overlong UTF-8 sequence could be abused to bypass UTF-8 substring tests that look only for the shortest possible encoding. All overlong UTF-8 sequences start with one of the following byte patterns:

    1100000x (10xxxxxx)
    11100000 100xxxxx (10xxxxxx)
    11110000 1000xxxx (10xxxxxx 10xxxxxx)
    11111000 10000xxx (10xxxxxx 10xxxxxx 10xxxxxx)
    11111100 100000xx (10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx)

    Also note that the code positions U+D800 to U+DFFF (UTF-16 surrogates) as well as U+FFFE and U+FFFF must not occur in normal UTF-8 or UCS-4 data. UTF-8 decoders should treat them like malformed or overlong sequences for safety reasons.

  • 相关阅读:
    斐波那契数列
    旋转数组的最小数字
    用两个栈实现队列
    重建二叉树
    从尾到头打印链表
    2020/01/11,人活着是为了一口气
    2020/01/11,放肆和克制,敏感层次
    2020/01/11,记忆单元
    2020/01/11,经济基础决定高层建筑和个性
    git
  • 原文地址:https://www.cnblogs.com/Searchor/p/11365549.html
Copyright © 2011-2022 走看看