zoukankan html css js c++ java

Unicode 范围以及python中生成所有Unicode的方法

Unicode范围和表示语言

Unicode是一个通用的字符集，包含了65535个字符。计算机在处理特殊字符（除了ASCII表以外的所有字符）时都是把Unicode按照一种编码来保存的。当然了，unicode的统一花了不少人的精力，而且不同编码到今天还有一些不兼容的问题，不过平常的代码中了解一些基础也就够了。

Unicode字符表示语言的范围参考下文：

http://www.cnblogs.com/chenwenbiao/archive/2011/08/17/2142718.html

中文（包括日文韩文同用）的范围：

Python生成所有Unicode

python2 版本：

def print_unicode(start, end):
    with open('unicode_set.txt', 'w') as f:
        loc_start = start
        ct = 0
        while loc_start <= end:
            try:
                ustr = hex(loc_start)[2:]
                od = (4 - len(ustr)) * '0' + ustr # 前补0
                ustr = unichr(loc_start) #'u' + od
                index = loc_start - start + 1
                f.write(str(index) + '	' + '0x' + od + '	' + ustr.encode('utf-8', 'ignore'))
                loc_start = loc_start + 1
            except Exception as e:
                traceback.print_exc()
                loc_start += 1
                print(loc_start)

由于python3对编码的处理方式变化（str和unicode合并，去掉unicode关键字；bytes替代python2的str），上述代码python2不能使用

python3版本如下

import traceback
def print_unicode3(start, end):
    #'wb' must be set, or f.write(str) will report error
    with open('unicode_set.txt', 'wb') as f:
        loc_start = start
        ct = 0
        while loc_start <= end:
            try:
                tmpstr = hex(loc_start)[2:]
                od = (4 - len(tmpstr)) * '0' + tmpstr # 前补0
                ustr = chr(loc_start) #
                index = loc_start - start + 1
                line = (str(index) + '	' + '0x' + od + '	' + ustr + '
').encode('utf-8')
                f.write(line)
                loc_start = loc_start + 1
            except Exception as e:
                traceback.print_exc()
                loc_start += 1
                print(loc_start)

def expect_test(expected, actual):
    if expected != actual:
        print('expected ', expected, 'actual', actual)

# 测试：
print_unicode3(0x4e00, 0x9fbf)
expect_test('在', 'u5728')

生成结果

中文

可以看到有些是不能显示的。

查看全文

相关阅读:
JQuery对象操作支持链式法则源码分析
 JQuery + JSON作为前后台数据交换格式实践
 JQuery html API支持解析执行Javascript脚本功能实现-代码分析
 跨域访问实践
 XP下安装MAC OS虚拟系统
 Android APP开发笔记
 CSS浮动与清浮动
 LUA 模块化编程例子
 JavaScript解决命名冲突的一种方法
 XML中文本节点存储任意字符的方法

原文地址：https://www.cnblogs.com/wangzming/p/7772091.html

Unicode 范围以及python中生成所有Unicode的方法

Unicode范围和表示语言

Python生成 所有Unicode

生成结果

Python生成所有Unicode