什么是string interning(字符串驻留)以及python中字符串的intern机制

zoukankan html css js c++ java

什么是string interning(字符串驻留)以及python中字符串的intern机制

Incomputer science, string interning is a method of storing only onecopy of each distinct string value, which must be immutable. Interning strings makes some stringprocessing tasks more time- or space-efficient at the cost of requiring moretime when the string is created or interned. The distinct values are stored ina string intern pool. --引自维基百科

也就是说，值同样的字符串对象仅仅会保存一份。是共用的，这也决定了字符串必须是不可变对象。想一想。就跟数值类型一样，同样的数值仅仅要保存一份即可了，不是必需用不同对象来区分。

python中的字符串採用了intern机制。会自己主动intern。

>>a = 'kzc'
>>b = 'k'+'zc'
>>id(a)
55704656
>>id(b)
55704656

能够看到。它们是同一个对象。

intern机制的优点是。须要值同样的字符串的时候（比方标识符）。直接从池里拿来用。避免频繁的创建和销毁。提升效率，节约内存。缺点是，拼接字符串、对字符串改动之类的影响性能。
由于是不可变的。所以对字符串改动不是inplace操作。要新建对象。
这也是为什么拼接多字符串的时候不建议用+而用join()。join()是先计算出全部字符串的长度，然后一一拷贝，仅仅new一次对象。

须要小心的坑。并非全部的字符串都会採用intern机制。仅仅包括下划线、数字、字母的字符串才会被intern。

>>a = 'hello world'
>>b = 'hello world'
>>id(a)
56400384
>>id(b)
56398336

这里由于有空格，全部没被intern。

可是为什么这么做呢？既然python内置函数intern()能显式对随意字符串进行intern。说明不是实现难度的问题。

答案在源代码stringobject.h中的凝视能够找到，

/* ... ... This is generally restricted tostrings that "looklike" Python identifiers, although the intern() builtincan be used to force interning of any string ... ... */

也就是说。仅仅对那些看起来像是python标识符的进行intern。

以下看另外一个坑。

例1.

>>'kz'+'c' is 'kzc'
True

例2.
>>s1 = 'kz'
>>s2 = 'kzc'
>>s1+'c' is 'kzc'
False

为什么第二个栗子是False,仅仅包括字母啊。不是应该被自己主动intern的么？

这是由于第一个栗子中，'kz'+'c'是在compile time求值的，被替换成了'kzc'.

而第二个栗子。s1+'c'是在run-time拼接的。导致没有被自己主动intern.

查看全文

相关阅读:
Eclipse CDT Linux下内存分析实战历险
 .Net元编程【Metaprogramming in NET】序-翻译
 go语言和资料
 代码提交【转】
两本有意思的书【代码的未来、淘宝技术这十年】
C/C++构建系统 GNU autotool
C/C++构建系统 -工具汇总
 使用Java语言开发微信公众平台(四)——图文消息的发送与响应
 Onsen UI 前端框架（二）
Maven项目搭建（一）：Maven初体验

原文地址：https://www.cnblogs.com/brucemengbm/p/6952822.html