zoukankan      html  css  js  c++  java
  • 【IRA/GSM/UCS2】the difference of IRA/GSM/UCS2 character set

    3GPP 27.007

    5.5       Select TE character set +CSCS

    Table 6: +CSCS parameter command syntax

    Command

    Possible response(s)

    +CSCS=[<chset>]

    +CSCS?

    +CSCS: <chset>

    +CSCS=?

    +CSCS: (list of supported <chset>s)

     

    Description

    Set command informs TA which character set <chset> is used by the TE. TA is then able to convert character strings correctly between TE and MT character sets.

    When TA‑TE interface is set to 8‑bit operation and used TE alphabet is 7‑bit, the highest bit shall be set to zero.

    NOTE:      It is manufacturer specific how the internal alphabet of MT is converted to/from the TE alphabet.

    Read command shows current setting and test command displays conversion schemes implemented in the TA.

    Defined values

    <chset>: character set as a string type (conversion schemes not listed here can be defined by manufacturers)

    "GSM"     GSM 7 bit default alphabet (3GPP TS 23.038 [25]); this setting causes easily software flow control (XON/XOFF) problems.

    "HEX"           Character strings consist only of hexadecimal numbers from 00 to FF; e.g. "032FE6" equals three 8-bit characters with decimal values 3, 47 and 230; no conversions to the original MT character set shall be done.

    If MT is using GSM 7 bit default alphabet, its characters shall be padded with 8th bit (zero) before converting them to hexadecimal numbers (i.e. no SMS‑style packing of 7‑bit alphabet).

    "IRA"           International reference alphabet (see ITU‑T Recommendation T.50 [13]).

    "PCCPxxx" PC character set Code Page xxx

    "PCDN"    PC Danish/Norwegian character set

    "UCS2"         16-bit universal multiple-octet coded character set (see ISO/IEC10646 [32]); UCS2 character strings are converted to hexadecimal numbers from 0000 to FFFF; e.g. "004100620063" equals three 16-bit characters with decimal values 65, 98 and 99.

    "UTF-8"      Octet (8-bit) lossless encoding of UCS characters (see RFC 3629 [69]); UTF-8 encodes each UCS character as a variable number of octets, where the number of octets depends on the integer value assigned to the UCS character. The input format shall be a stream of octets. It shall not be converted to hexadecimal numbers as in "HEX" or "UCS2". This character set requires an 8-bit TA – TE interface.

    "8859-n"  ISO 8859 Latin n (1‑6) character set

    "8859-C"  ISO 8859 Latin/Cyrillic character set

    "8859-A"  ISO 8859 Latin/Arabic character set

    "8859-G"  ISO 8859 Latin/Greek character set

    "8859-H"  ISO 8859 Latin/Hebrew character set

    Implementation

    Mandatory when a command using the setting of this command is implemented.

    ======================================================================================

    IRA

    http://mercury.webster.edu/aleshunas/COSC%205130/Q-IRA.pdf

    A familiar example of data is text or character strings. While textual data are most convenient
    for human beings, they cannot, in character form, be easily stored or transmitted by data
    processing and communications systems. Such systems are designed for binary data. Thus a
    number of codes have been devised by which characters are represented by a sequence of bits.
    Perhaps the earliest common example of this is the Morse code. Today, the most commonly used
    text code is the International Reference Alphabet (IRA).1 Each character in this code is
    represented by a unique 7-bit binary code; thus, 128 different characters can be represented.
    Table Q.1 lists all of the code values. In the table, the bits of each character are labeled from b7,
    which is the most significant bit, to b1, the least significant bit. Characters are of two types:
    printable and control (Table Q.2). Printable characters are the alphabetic, numeric, and special
    characters that can be printed on paper or displayed on a screen. For example, the bit
    representation of the character "K" is b7b6b5b4b3b2b1 = 1001011. Some of the control characters
    have to do with controlling the printing or displaying of characters; an example is carriage return.
    Other control characters are concerned with communications procedures.
    IRA-encoded characters are almost always stored and transmitted using 8 bits per
    character. The eighth bit is a parity bit used for error detection. The parity bit is the most
    significant bit and is therefore labeled b8. This bit is set such that the total number of binary 1s in
    each octet is always odd (odd parity) or always even (even parity). Thus a transmission error that
    changes a single bit, or any odd number of bits, can be detected

    GSM

    https://en.wikipedia.org/wiki/GSM_03.38

    GSM 7-bit default alphabet and extension table of 3GPP TS 23.038 / GSM 03.38[edit]

    The standard encoding for GSM messages is the 7-bit default alphabet as defined in the 23.038 recommendation.

    Seven-bit characters must be encoded into octets following one of three packing modes:

    • CBS: using this encoding, it is possible to send up to 93 characters (packed in up to 82 octets) in one SMS message in a Cell Broadcast Service.
    • SMS: using this encoding, it is possible to send up to 160 characters (packed in up to 140 octets) in one SMS message in the GSM network.
    • USSD: using this encoding, it is possible to send up to 182 characters (packed in up to 160 octets) in one SMS message of Unstructured Supplementary Service Data.

    GSM 8-bit data encoding[edit]

    8-bit data encoding mode treats the information as raw data. According to the standard, the alphabet for this encoding is user-specific.

    UCS-2 Encoding[edit]

    This encoding allows use of a greater range of characters and languages. UCS-2 can represent the most commonly used Latin and eastern characters at the cost of a greater space expense. Actually, some cell phones (e.g. iPhones) use UTF-16 instead of UCS-2 to display emoticons in short messages.[4]

    A single SMS GSM message using this encoding can have at most 70 characters (140 octets).

    Note that on many GSM cell phones, there's no specific preselection of the UCS-2 encoding. The default is to use the 7-bit encoding described above, until one enters a character that is not present in the GSM 7-bit table (for example the lowercase 'a' with acute: 'á'). In that case, the whole message gets reencoded using the UCS-2 encoding, and the maximum length of the message sent in only 1 SMS is immediately reduced to 70 characters, instead of 160. On smartphones the message encoding depends on the SMS application used and its setting as well as on the length of the message. Some smartphones even send longer messages as a multimedia message (MMS).

    To avoid unexpected costs for senders that have a subscription for a limited pack of sent SMS, smartphones should display the number of character used and the maximum number of characters in the composed SMS. When a message does exceeds this maximum, the message will be sent as multiple successive SMS containing parts of the message (each one containing a sequence number, which also uses a few leading characters in each part); these parts will be reassembled later by the recipient.

    Some GSM smartphones will alert the user about the number of SMS messages needed to send the message, when it requires more than one.

  • 相关阅读:
    DirectX9:基础篇 第五章 绘制流水线
    他山之石:可以攻玉
    C89:论内存分配问题
    C89:关键字
    MFC:开头篇 介绍
    DirectX9:先导篇 数学基础
    模仿轮播图效果
    text选中后displa出label内容
    ASP.NET页面之间传递值的几种方式
    jquery-delay(),queue()
  • 原文地址:https://www.cnblogs.com/yueyuechen/p/6520266.html
Copyright © 2011-2022 走看看