zoukankan      html  css  js  c++  java
  • LanguageTag

    LanguageTag

    Table of Contents

    This is a memo of RFC 5646, ie BCP-47.

    1 The Language Tag

    Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages but excludes languages not intended primarily for human communication, such as programming languages.

    1.1 Syntax

    • TAG is composed from a sequence of one or more subtags
    • SubTags are sequence of alphanumric characters to narrow the range of languge.
    • SubTags are concated suing "-".

    The syntax of the language tag in ABNF [RFC5234] is:

    Language-Tag  = langtag             ; normal language tags
                  / privateuse          ; private use tag
                  / grandfathered       ; grandfathered tags
    
    langtag       = language
                    ["-" script]
                    ["-" region]
                    *("-" variant)
                    *("-" extension)
                    ["-" privateuse]
    
    language      = 2*3ALPHA            ; shortest ISO 639 code
                    ["-" extlang]       ; sometimes followed by
                                        ; extended language subtags
                  / 4ALPHA              ; or reserved for future use
                  / 5*8ALPHA            ; or registered language subtag
    
    extlang       = 3ALPHA              ; selected ISO 639 codes
                    *2("-" 3ALPHA)      ; permanently reserved
    
    script        = 4ALPHA              ; ISO 15924 code
    
    region        = 2ALPHA              ; ISO 3166-1 code
                  / 3DIGIT              ; UN M.49 code
    
    variant       = 5*8alphanum         ; registered variants
                  / (DIGIT 3alphanum)
    
    extension     = singleton 1*("-" (2*8alphanum))
    
                                        ; Single alphanumerics
                                        ; "x" reserved for private use
    singleton     = DIGIT               ; 0 - 9
                  / %x41-57             ; A - W
                  / %x59-5A             ; Y - Z
                  / %x61-77             ; a - w
                  / %x79-7A             ; y - z
    
    privateuse    = "x" 1*("-" (1*8alphanum))
    
    grandfathered = irregular           ; non-redundant tags registered
                  / regular             ; during the RFC 3066 era
    
    irregular     = "en-GB-oed"         ; irregular tags do not match
                  / "i-ami"             ; the 'langtag' production and
                  / "i-bnn"             ; would not otherwise be
                  / "i-default"         ; considered 'well-formed'
                  / "i-enochian"        ; These tags are all valid,
                  / "i-hak"             ; but most are deprecated
                  / "i-klingon"         ; in favor of more modern
                  / "i-lux"             ; subtags or subtag
                  / "i-mingo"           ; combination
                  / "i-navajo"
                  / "i-pwn"
                  / "i-tao"
                  / "i-tay"
                  / "i-tsu"
                  / "sgn-BE-FR"
                  / "sgn-BE-NL"
                  / "sgn-CH-DE"
    
    regular       = "art-lojban"        ; these tags match the 'langtag'
                  / "cel-gaulish"       ; production, but their subtags
                  / "no-bok"            ; are not extended language
                  / "no-nyn"            ; or variant subtags: their meaning
                  / "zh-guoyu"          ; is defined by their registration
                  / "zh-hakka"          ; and all of these are deprecated
                  / "zh-min"            ; in favor of a more modern
                  / "zh-min-nan"        ; subtag or sequence of subtags
                  / "zh-xiang"
    
    alphanum      = (ALPHA / DIGIT)     ; letters and numbers
    

    Figure 1: Language Tag ABNF

    Note:

    1.1.1 Formatting of Languge Tags

    Although tags should be case-insensitive, there are formatting conventions:

    • recommends that language codes be written in lowercase ('mn' Mongolian).
    • recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
    • recommends that country codes be capitalized ('MN' Mongolia).

    1.2 Language Subtag Sources and Interpretation

    The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA) according to the rules in Section 5 of this document. The Language Subtag Registry maintained by IANA is the source for valid subtags: other standards referenced in this section provide the source material for that registry.

    1.2.1 Primary Language Subtag

    Should never be omitted in most cases, can be two or three characters.

  • 相关阅读:
    bash脚本编程之数组和字符串处理
    Linux启动流程简介以及各启动阶段失败的恢复方法
    Linux路由表的重要性以及配置
    Linux终端和伪终端简述
    Linux九阴真经之无影剑残卷9(Shell脚本编程进阶)
    Linux九阴真经之无影剑残卷8(计划任务)
    Linux九阴真经之无影剑残卷7(进程管理)
    Linux九阴真经之无影剑残卷5(Linux静态路由的实现)
    Linux九阴真经之无影剑残卷4(创建虚拟内存--swap)
    Linux九阴真经之无影剑残卷3(将home目录搬到新分区)
  • 原文地址:https://www.cnblogs.com/yangyingchao/p/3794436.html
Copyright © 2011-2022 走看看