zoukankan      html  css  js  c++  java
  • NSStringEncoding

    今天看见一个很棒的博客,只是无法粉丝之,就转载一下几篇很好用的博文吧

    转载至:http://hi.baidu.com/may2150209/blog/item/198976ace7e583054b36d6f1.html

    PS:发现博主也是转载的,anyway,好用就行

    以下为正文

    今天在尝试抓取起点中文网首页的时候遇到了一个问题 — 如果编码没有用对的话是没办

    法读取任何东西的.

    这也算是C#用的太多养成的坏习惯, 以前基本没怎么考虑过编码问题. 应该说, C#里面就算编码错了, 也能读进来东西,

    只是一片乱码而已. Cocoa里面就狠了点, 直接抛异常了.

    下面是刚开始写的一段代码, 把起点中文网的主页下载到一个字符串中.

    NSURL *url = [[NSURL alloc]

    initWithString:@"http://www.cmfu.com"];

    NSError *error;

    NSString *xml = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];

    if(xml == nil)

    { NSLog(@"Error reading url at %@", [error localizedFailureReason]); }

    else { [result setString:xml]; }

    死活下载失败, 错误信息就是编码不对. 好吧, 我打开了帮助查看了下所有的编码:

    enum {

    NSASCIIStringEncoding =

    1,

    NSNEXTSTEPStringEncoding =

    2,

    NSJapaneseEUCStringEncoding =

    3,

    NSUTF8StringEncoding =

    4,

    NSISOLatin1StringEncoding =

    5,

    NSSymbolStringEncoding =

    6,

    NSNonLossyASCIIStringEncoding =

    7,

    NSShiftJISStringEncoding =

    8,

    NSISOLatin2StringEncoding =

    9,

    NSUnicodeStringEncoding =

    10,

    NSWindowsCP1251StringEncoding =

    11,

    NSWindowsCP1252StringEncoding =

    12,

    NSWindowsCP1253StringEncoding =

    13,

    NSWindowsCP1254StringEncoding =

    14,

    NSWindowsCP1250StringEncoding =

    15,

    NSISO2022JPStringEncoding =

    21,

    NSMacOSRomanStringEncoding =

    30,

    NSUTF16StringEncoding = NSUnicodeStringEncoding,

    NSUTF16BigEndianStringEncoding =

    0x90000100,

    NSUTF16LittleEndianStringEncoding =

    0x94000100,

    NSUTF32StringEncoding =

    0x8c000100,

    NSUTF32BigEndianStringEncoding =

    0x98000100,

    NSUTF32LittleEndianStringEncoding =

    0x9c000100,

    };

    我一个一个的试,

    居然全都不行! 崩溃了, 这都什么年代了, 难道Cocoa还不支持中文? 不可能啊.

    估计是上面那份文档里面只是列出了最长用的几种编码(这里是苹果认为最长用的, 可见对于中国基本是无视了, 鄙视下!),

    我就写了下面这段代码输出了所有支持的编码:

    const NSStringEncoding *encodings = [NSString availableStringEncodings];

    NSMutableString *str = [[NSMutableString alloc] init];

    NSStringEncoding encoding;

    while ((encoding = *encodings++) != 0)

    {

    [str appendFormat: @"%@ === %in", [NSString localizedNameOfStringEncoding:encoding], encoding]; }

    [result setString: str];

    好家伙, 果然被我猜中了, 下面就是所有支持的编码列表

    Western (Mac OS Roman) === 30

    Japanese (Mac OS) === -2147483647

    Traditional Chinese (Mac OS) === -2147483646

    Korean (Mac OS) === -2147483645

    Arabic (Mac OS) === -2147483644

    Hebrew (Mac OS) === -2147483643

    Greek (Mac OS) === -2147483642

    Cyrillic (Mac OS) === -2147483641

    Devanagari (Mac OS) === -2147483639

    Gurmukhi (Mac OS) === -2147483638

    Gujarati (Mac OS) === -2147483637

    Thai (Mac OS) === -2147483627

    Simplified Chinese (Mac OS) === -2147483623

    Tibetan (Mac OS) === -2147483622

    Central European (Mac OS) === -2147483619

    Symbol (Mac OS) === 6

    Dingbats (Mac OS) === -2147483614

    Turkish (Mac OS) === -2147483613

    Croatian (Mac OS) === -2147483612

    Icelandic (Mac OS) === -2147483611

    Romanian (Mac OS) === -2147483610

    Celtic (Mac OS) === -2147483609

    Gaelic (Mac OS) === -2147483608

    Keyboard Symbols (Mac OS) === -2147483607

    Farsi (Mac OS) === -2147483508

    Cyrillic (Mac OS Ukrainian) === -2147483496

    Inuit (Mac OS) === -2147483412

    Unicode (UTF-32LE) === -1677721344

    Unicode (UTF-8) === 4

    Unicode (UTF-16) === 10

    Unicode (UTF-16BE) === -1879047936

    Unicode (UTF-16LE) === -1811939072

    Unicode (UTF-32) === -1946156800

    Unicode (UTF-32BE) === -1744830208

    Western (ISO Latin 1) === 5

    Central European (ISO Latin 2) === 9

    Western (ISO Latin 3) === -2147483133

    Central European (ISO Latin 4) === -2147483132

    Cyrillic (ISO 8859-5) === -2147483131

    Arabic (ISO 8859-6) === -2147483130

    Greek (ISO 8859-7) === -2147483129

    Hebrew (ISO 8859-8) === -2147483128

    Turkish (ISO Latin 5) === -2147483127

    Nordic (ISO Latin 6) === -2147483126

    Thai (ISO 8859-11) === -2147483125

    Baltic Rim (ISO Latin 7) === -2147483123

    Celtic (ISO Latin ===

    -2147483122

    Western (ISO Latin 9) === -2147483121

    Romanian (ISO Latin 10) === -2147483120

    Latin-US (DOS) === -2147482624

    Greek (DOS) === -2147482619

    Baltic Rim (DOS) === -2147482618

    Western (DOS Latin 1) === -2147482608

    Greek (DOS Greek 1) === -2147482607

    Central European (DOS Latin 2) === -2147482606

    Cyrillic (DOS) === -2147482605

    Turkish (DOS) === -2147482604

    Portuguese (DOS) === -2147482603

    Icelandic (DOS) === -2147482602

    Hebrew (DOS) === -2147482601

    Canadian French (DOS) === -2147482600

    Arabic (DOS) === -2147482599

    Nordic (DOS) === -2147482598

    Cyrillic (DOS) === -2147482597

    Greek (DOS Greek 2) === -2147482596

    Thai (Windows, DOS) === -2147482595

    Japanese (Windows, DOS) === 8

    Simplified Chinese (Windows, DOS) === -2147482591

    Korean (Windows, DOS) === -2147482590

    Traditional Chinese (Windows, DOS) === -2147482589

    Western (Windows Latin 1) === 12

    Central European (Windows Latin 2) === 15

    Cyrillic (Windows) === 11

    Greek (Windows) === 13

    Turkish (Windows Latin 5) === 14

    Hebrew (Windows) === -2147482363

    Arabic (Windows) === -2147482362

    Baltic Rim (Windows) === -2147482361

    Vietnamese (Windows) === -2147482360

    Western (ASCII) === 1

    Japanese (Shift JIS X0213) === -2147482072

    Chinese (GBK) === -2147482063

    Chinese (GB 18030) === -2147482062

    Japanese (ISO 2022-JP) === 21

    Korean (ISO 2022-KR) === -2147481536

    Japanese (EUC) === 3

    Simplified Chinese (EUC) === -2147481296

    Traditional Chinese (EUC) === -2147481295

    Korean (EUC) === -2147481280

    Japanese (Shift JIS) === -2147481087

    Cyrillic (KOI8-R) === -2147481086

    Traditional Chinese (Big 5) === -2147481085

    Western (Mac Mail) === -2147481084

    Simplified Chinese (HZ GB 2312) === -2147481083

    Traditional Chinese (Big 5 HKSCS) === -2147481082

    Ukrainian (KOI8-U) === -2147481080

    Traditional Chinese (Big 5-E) === -2147481079

    Western (NextStep) === 2

    Non-lossy ASCII === 7

    Western (EBCDIC Latin 1) === -2147480574

    终于看到了熟悉的 GBK 编码, 对应的代码是 -2147482063. Ok, 更改一下最开始的代码

    NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];

    NSError *error;

    NSStringEncoding encoder;

    NSString *xml = [NSString stringWithContentsOfURL:url encoding:encoder=-2147482063 error:&error];

    if(xml == nil)

    { NSLog(@"Error reading url at %@", [error localizedFailureReason]); }

    else { [result setString:xml]; }

    终于搞定了! 看到熟悉的中文真是激动了.

    注:转载的

  • 相关阅读:
    JDBC获取数据库表字段信息
    No bean named 'springSecurityFilterChain' is defined
    VS 2010中对WPF4有哪些多点触摸支持?
    文件管理File类
    VS 2010 Beta2中WPF有哪些改进?
    WPF的实质
    C#中AppDomain.CurrentDomain.BaseDirectory与Application.StartupPath的区别
    VS 2010 Beta2中WPF与Silverlight的关键区别?
    C# 图片与byte[]之间以及byte[]与string之间的转换
    日期格式化{0:yyyyMMdd HH:mm:ss.fff}和{0:yyyyMMdd hh:mm:ss.fff}的区别
  • 原文地址:https://www.cnblogs.com/zhwl/p/2840746.html
Copyright © 2011-2022 走看看