zoukankan      html  css  js  c++  java
  • (C#) Encoding.

    Encoding.GetEncoding(936)).Contains(@"这是简体中文")

    在.NET的世界里,string永远是unicode,所以通过读取TXT文件的每行,然后来判断其内容时,需要进行解码。

    foreach (string line in File.ReadAllLines(“D:\\test.txt"))
    {
      Console.writeline (" {0}" + line);
    }

    具体编码参考MSDN. Encoding类

    http://msdn.microsoft.com/zh-cn/library/system.text.encoding(v=vs.100).aspx

    Windows Locale Codes Sorted by Codepage 

    As defined by Microsoft, a locale is either a language or a language in combination with a country. SeeMicrosoft definitions of locale.

    CLICK one of the Column Titles to sort the table by that item.

    Language (Locale)LCID
    Decimal
    LCID
    Hexade.
    CodepageCountry
    code
    Telugu 1098 044a 0 IND
    Gujarati 1095 0447 0 IND
    Punjabi 1094 0446 0 IND
    Sanskrit 1103 044f 0 IND
    Konkani 1111 0457 0 IND
    Syriac 1114 045a 0 SYR
    Kannada 1099 044b 0 IND
    Marathi 1102 044e 0 IND
    Divehi 1125 0465 0 MDV
    Armenian 1067 042b 0 ARM
    Hindi 1081 0439 0 IND
    Georgian 1079 0437 0 GEO
    Tamil 1097 0449 0 IND
    Thai 1054 041e 874 THA
    Japanese 1041 0411 932 JPN
    Chinese (PRC) 2052 0804 936 CHN
    Chinese (Singapore) 4100 1004 936 SGP
    Korean 1042 0412 949 KOR
    Chinese (Macau S.A.R.) 5124 1404 950 MCO
    Chinese (Hong Kong S.A.R.) 3076 0c04 950 HKG
    Chinese (Taiwan) 1028 0404 950 TWN
    Romanian 1048 0418 1250 ROM
    Slovenian 1060 0424 1250 SVN
    Hungarian 1038 040e 1250 HUN
    Slovak 1051 041b 1250 SVK
    Polish 1045 0415 1250 POL
    Albanian 1052 041c 1250 ALB
    Serbian (Latin) 2074 081a 1250 SPB
    Croatian 1050 041a 1250 HRV
    Czech 1029 0405 1250 CZE
    Mongolian (Cyrillic) 1104 0450 1251 MNG
    FYRO Macedonian 1071 042f 1251 MKD
    Uzbek (Cyrillic) 2115 0843 1251 UZB
    Ukrainian 1058 0422 1251 UKR
    Azeri (Cyrillic) 2092 082c 1251 AZE
    Tatar 1092 0444 1251 RUS
    Kazakh 1087 043f 1251 KAZ
    Belarusian 1059 0423 1251 BLR
    Kyrgyz (Cyrillic) 1088 0440 1251 KGZ
    Bulgarian 1026 0402 1251 BGR
    Serbian (Cyrillic) 3098 0c1a 1251 SPB
    Russian 1049 0419 1251 RUS
    English (Jamaica) 8201 2009 1252 JAM
    French (Canada) 3084 0c0c 1252 CAN
    French (France) 1036 040c 1252 FRA
    French (Luxembourg) 5132 140c 1252 LUX
    English (New Zealand) 5129 1409 1252 NZL
    English (Ireland) 6153 1809 1252 IRL
    Dutch (Netherlands) 1043 0413 1252 NLD
    English (Caribbean) 9225 2409 1252 CAR
    French (Switzerland) 4108 100c 1252 CHE
    English (Canada) 4105 1009 1252 CAN
    Galician 1110 0456 1252 ESP
    English (Belize) 10249 2809 1252 BLZ
    German (Austria) 3079 0c07 1252 AUT
    French (Monaco) 6156 180c 1252 MCO
    English (Zimbabwe) 12297 3009 1252 ZWE
    Basque 1069 042d 1252 ESP
    Dutch (Belgium) 2067 0813 1252 BEL
    French (Belgium) 2060 080c 1252 BEL
    Finnish 1035 040b 1252 FIN
    Faroese 1080 0438 1252 FRO
    German (Germany) 1031 0407 1252 DEU
    English (Australia) 3081 0c09 1252 AUS
    English (United States) 1033 0409 1252 USA
    English (United Kingdom) 2057 0809 1252 GBR
    Catalan 1027 0403 1252 ESP
    English (Trinidad) 11273 2c09 1252 TTO
    English (South Africa) 7177 1c09 1252 ZAF
    Danish 1030 0406 1252 DNK
    English (Philippines) 13321 3409 1252 PHL
    Spanish (Paraguay) 15370 3c0a 1252 PRY
    Spanish (Colombia) 9226 240a 1252 COL
    Spanish (Costa Rica) 5130 140a 1252 CRI
    Spanish (Dominican Republic) 7178 1c0a 1252 DOM
    Spanish (Ecuador) 12298 300a 1252 ECU
    Spanish (El Salvador) 17418 440a 1252 SLV
    Spanish (Guatemala) 4106 100a 1252 GTM
    Spanish (Honduras) 18442 480a 1252 HND
    Spanish (International Sort) 3082 0c0a 1252 ESP
    Spanish (Chile) 13322 340a 1252 CHL
    Spanish (Nicaragua) 19466 4c0a 1252 NIC
    Spanish (Mexico) 2058 080a 1252 MEX
    Spanish (Peru) 10250 280a 1252 PER
    Spanish (Puerto Rico) 20490 500a 1252 PRI
    Spanish (Traditional Sort) 1034 040a 1252 ESP
    Spanish (Uruguay) 14346 380a 1252 URY
    Spanish (Venezuela) 8202 200a 1252 VEN
    Swahili 1089 0441 1252 KEN
    Swedish 1053 041d 1252 SWE
    Swedish (Finland) 2077 081d 1252 FIN
    German (Liechtenstein) 5127 1407 1252 LIE
    Afrikaans 1078 0436 1252 ZAF
    Spanish (Panama) 6154 180a 1252 PAN
    German (Luxembourg) 4103 1007 1252 LUX
    Spanish (Bolivia) 16394 400a 1252 BOL
    German (Switzerland) 2055 0807 1252 CHE
    Icelandic 1039 040f 1252 ISL
    Indonesian 1057 0421 1252 IDN
    Italian (Italy) 1040 0410 1252 ITA
    Italian (Switzerland) 2064 0810 1252 CHE
    Norwegian (Nynorsk) 2068 0814 1252 NOR
    Spanish (Argentina) 11274 2c0a 1252 ARG
    Portuguese (Brazil) 1046 0416 1252 BRA
    Norwegian (Bokmal) 1044 0414 1252 NOR
    Malay (Malaysia) 1086 043e 1252 MYS
    Malay (Brunei Darussalam) 2110 083e 1252 BRN
    Portuguese (Portugal) 2070 0816 1252 PRT
    Greek 1032 0408 1253 GRC
    Uzbek (Latin) 1091 0443 1254 UZB
    Azeri (Latin) 1068 042c 1254 AZE
    Turkish 1055 041f 1254 TUR
    Hebrew 1037 040d 1255 ISR
    Arabic (Algeria) 5121 1401 1256 DZA
    Arabic (Bahrain) 15361 3c01 1256 BHR
    Arabic (Yemen) 9217 2401 1256 YEM
    Arabic (Egypt) 3073 0c01 1256 EGY
    Arabic (Iraq) 2049 0801 1256 IRQ
    Arabic (Jordan) 11265 2c01 1256 JOR
    Arabic (Kuwait) 13313 3401 1256 KWT
    Arabic (Lebanon) 12289 3001 1256 LBN
    Arabic (Libya) 4097 1001 1256 LBY
    Arabic (Morocco) 6145 1801 1256 MAR
    Arabic (Oman) 8193 2001 1256 OMN
    Arabic (Qatar) 16385 4001 1256 QAT
    Arabic (Saudi Arabia) 1025 0401 1256 SAU
    Arabic (Syria) 10241 2801 1256 SYR
    Arabic (U.A.E.) 14337 3801 1256 ARE
    Farsi 1065 0429 1256 IRN
    Urdu 1056 0420 1256 PAK
    Arabic (Tunisia) 7169 1c01 1256 TUN
    Estonian 1061 0425 1257 EST
    Latvian 1062 0426 1257 LVA
    Lithuanian 1063 0427 1257 LTU
    Vietnamese 1066 042a 1258 VNM

    This table was generated from information at List of Locale IDs and Language Groups for Microsoft Windows 2000

    Definitions

    Locale: A collection of language-related, user-preference information represented as a list of values. (Reference)

    Locale ID (LCID): A 32-bit value defined by Microsoft Windows that consists of a language ID, sort ID, and reserved bits that identify a particular language.

    Codepage: "An ordered set of characters in which a numeric index (code point values) is associated with each character. The first 128 characters of each codepage are functionally the same and include all characters needed to type English text. The upper 128 characters of OEM and ANSI codepages contain characters used in a language or group of languages (Taken from Related resources below)".

    Character Encoding Recommendation for Language

    IANA encoding Java Canonical Name Language Comment
    UTF-8 UTF8 8bit Universal character set  
    UTF-16 UTF-16 16bit Universal character set  
    US-ASCII ASCII American Standard Code for Information Interchange  
    windows-1250 Cp1250 Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) Windows encoding
    windows-1251 Cp1251 Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian Windows encoding
    windows-1252 Cp1252 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) Windows encoding
    windows-1253 Cp1253 Greek Windows encoding
    windows-1254 Cp1254 Turkish Windows encoding
    windows-1255 Cp1255 Hebrew Windows encoding
    windows-1256 Cp1256 Arabic Windows encoding
    windows-1257 Cp1257 Baltic Windows encoding
    windows-1258 Cp1258 Vietnamese Windows encoding
    ISO-8859-1 ISO8859_1 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) Euro Symbol is not supported
    ISO-8859-2 ISO8859_2 Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian) Euro Symbol is not supported
    ISO-8859-3 ISO8859_3 Southeastern European (Afrikaans, Catalan, Dutch, English, Esperanto, German, Italian, Maltese, Spanish, Turkish)  
    ISO-8859-4 ISO8859_4 Northern European (Danish, English, Estonian, Finnish, German, Greenlandic, Latin, Latvian, Lithuanian, Norwegian, Sテ。mi, Slovenian, Swedish)  
    ISO-8859-5 ISO8859_5 Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian)  
    ISO-8859-6 ISO8859_6 Arabic  
    ISO-8859-7 ISO8859_7 Greek  
    ISO-8859-8 ISO8859_8 Hebrew  
    ISO-8859-9 ISO8859_9 Western European (Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, English, Finnish, French, Frisian, Galician, German, Greenlandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, Turkish)  
    ISO-8859-13 ISO8859_13 Baltic Rim (English, Estonian, Finnish, Latin, Latvian, Norwegian)  
    ISO-8859-15 ISO8859_15 Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish) ISO-8859-1 with Euro symbol support
    windows-31j MS932 Japanese Windows encoding
    EUC-JP EUC_JP Japanese EUC encoding used on Unix platform
    Shift_JIS SJIS Japanese Shift JIS, does not support MS external characters
    ISO-2022-JP ISO2022JP Japanese JIS X 0201, 0208, in ISO 2022 form, this is used for e-mail
    x-mswin-936 MS936 Simplified Chinese Windows encoding, This is not registered in IANA.
    GB18030 GB18030 Simplified Chinese PRC standard
    x-EUC-CN EUC_CN Simplified Chinese GB2312, EUC encoding
    GBK GBK Simplified Chinese  
    x-windows-949 MS949 Korean Windows encoding, this is not registered in IANA.
    EUC-KR EUC_KR Korean KS C 5601, EUC encoding
    x-windows-950 MS950 Traditional Chinese Windows encoding, this is not registered in IANA
    x-MS950-HKSCS MS950_HKSCS Traditional Chinese with Hong Kong extensions Windows encoding, this is not registered in IANA
    x-EUC-TW EUC_TW Traditional Chinese CNS11643 (Plane 1-3), EUC encoding, this is not registered in IANA
    Big5 Big5 Traditional Chinese  
    Big5-HKSCS Big5_HKSCS Traditional Chinese Big5 with Hong Kong extensions
    TIS-620 TIS620 Thai
  • 相关阅读:
    iOS应用程序的登录界面
    访问Mac下virtualbox中的win8.1虚拟机
    JASIG-CAS学习笔记——初探CAS
    跨域读取Cookies(续)
    跨域读取Cookies
    错误——无法找到com/* /* /**.xml
    设计模式学习之——简单工厂、工厂方法、抽象工厂方法
    spring+ibatis+dwr+ext项目整合
    SenchaTouch学习——form表单
    FLEX自定义事件
  • 原文地址:https://www.cnblogs.com/fdyang/p/3032171.html
Copyright © 2011-2022 走看看