字符串拼接
字符串拼接的实现方式主要有两种
现以python和go来说明两种实现方式,python和go中的字符串对象都是不可变的.
go的代码:
package main
import "fmt"
func main() {
t := ""
s := "123"
for i := 0; i < 100000; i++ {
t += s
}
fmt.Println("len:", len(t))
}
time的输出:
real 0m1.706s
user 0m1.703s
sys 0m0.108s
python的代码:
t = ""
s = "123"
for _ in range(100000):
t += s
print len(t)
time的输出:
real 0m0.053s
user 0m0.033s
sys 0m0.014s
go的效率不是接近于C的吗?python不是那蜗牛般的脚本语言吗?
让我们来分别看下go和python是如何实现字符串拼接的
python的源代码:
static PyObject *
string_concatenate(PyObject *v, PyObject *w,
PyFrameObject *f, unsigned char *next_instr)
{
/* This function implements 'variable += expr' when both arguments
are strings. */
Py_ssize_t v_len = PyString_GET_SIZE(v);
Py_ssize_t w_len = PyString_GET_SIZE(w);
Py_ssize_t new_len = v_len + w_len;
if (new_len < 0) {
PyErr_SetString(PyExc_OverflowError,
"strings are too large to concat");
return NULL;
}
if (v->ob_refcnt == 2) {
/* In the common case, there are 2 references to the value
* stored in 'variable' when the += is performed: one on the
* value stack (in 'v') and one still stored in the
* 'variable'. We try to delete the variable now to reduce
* the refcnt to 1.
*/
switch (*next_instr) {
case STORE_FAST:
{
int oparg = PEEKARG();
PyObject **fastlocals = f->f_localsplus;
if (GETLOCAL(oparg) == v)
SETLOCAL(oparg, NULL);
break;
}
case STORE_DEREF:
{
PyObject **freevars = (f->f_localsplus +
f->f_code->co_nlocals);
PyObject *c = freevars[PEEKARG()];
if (PyCell_GET(c) == v)
PyCell_Set(c, NULL);
break;
}
case STORE_NAME:
{
PyObject *names = f->f_code->co_names;
PyObject *name = GETITEM(names, PEEKARG());
PyObject *locals = f->f_locals;
if (PyDict_CheckExact(locals) &&
PyDict_GetItem(locals, name) == v) {
if (PyDict_DelItem(locals, name) != 0) {
PyErr_Clear();
}
}
break;
}
}
}
if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {
/* Now we own the last reference to 'v', so we can resize it
* in-place.
*/
if (_PyString_Resize(&v, new_len) != 0) {
/* XXX if _PyString_Resize() fails, 'v' has been
* deallocated so it cannot be put back into
* 'variable'. The MemoryError is raised when there
* is no value in 'variable', which might (very
* remotely) be a cause of incompatibilities.
*/
return NULL;
}
/* copy 'w' into the newly allocated area of 'v' */
memcpy(PyString_AS_STRING(v) + v_len,
PyString_AS_STRING(w), w_len);
return v;
}
else {
/* When in-place resizing is not an option. */
PyString_Concat(&v, w);
return v;
}
}
可以看到python对于+=这种字符串拼接做了优化,它会增大左值的大小,然后把另一个字符串的内容copy到左值扩充的内存处,时间复杂度为O(n)
go的源代码:
String
runtime·catstring(String s1, String s2)
{
String s3;
if(s1.len == 0)
return s2;
if(s2.len == 0)
return s1;
s3 = gostringsize(s1.len + s2.len);
runtime·memmove(s3.str, s1.str, s1.len);
runtime·memmove(s3.str+s1.len, s2.str, s2.len);
return s3;
}
go对于字符串的拼接未做任何优化,每一次拼接,都会申请一段新的内存,将两个字符串的内容copy到新的对象中,时间复杂度为O(n2)