zoukankan html css js c++ java

String、StringBuffer、StringBuilder源码解读

序

好长时间没有认真写博客了，过去的一年挺忙的。负责过数据库、线上运维环境、写代码、Code review等等东西挺多。
学习了不少多方面的东西，不过还是需要回归实际、加强内功，方能扛鼎。

去年学习Mysql列举了大纲，书写了一部分。后来进入到工作状态，就没有继续书写。当然其实没有书写的内容部分已经总结到了公司内部的wiki中，或者在工作过程中大半也应用过，也懒得书写下来了。看什么时候又有心情，重新回顾总结一下吧。

下一步的学习计划

数据结构、算法、源代码解读、多线程（哎，学无止境）

为什么先说String呢？

其实绝大部分业务开发过程中String都是最常用的类。常常利用JProfiler这类工具做内存分析时，能看到char[]（为什么是char[]在接下来的源码解读中会有提现）能站到70%以上。

类关系图

简要对比

差别	String	StringBuffer	StringBuilder
常量 / 变量	常量	变量	变量
线程是否安全	安全	安全	非安全
所在内存区域	Constant String Pool(常量池)	heap	heap
是否能被继承	否	否	否
代码行数	3157	718	448
使用场景	在字符串不经常变化的场景	在频繁进行字符串运算（如拼接、替换、删除等），并且运行在多线程环境	在频繁进行字符串运算（如拼接、替换、和删除等），并且运行在单线程的环境
场景举例	常量的声明、少量的变量运算	XML 解析、HTTP 参数解析和封装	SQL 语句的拼装、JSON 封装

从代码行数来上说String类更大，其中大量的方法重载拓展了篇幅。同时注释文档详细，注释的行文风格常常看到一个简短的定义之后，紧跟一个由that或the引导的定语从句（定语从句一般皆放在被它所修饰的名（代）词之后）。
例:

1 /**
2      * Allocates a new {@code String} that contains characters from a subarray
3      * of the <a href="Character.html#unicode">Unicode code point</a> array
4      * argument.  The {@code offset} argument is the index of the first code
5      * point of the subarray and the {@code count} argument specifies the
6      * length of the subarray.  The contents of the subarray are converted to
7      * {@code char}s; subsequent modification of the {@code int} array does not
8      * affect the newly created string.
9    **/

View Code

AbstractStringBuilder ：StringBuffer类与StringBuilder类都继承了AbstractStringBuilder，抽象父类里实现了除toString以外的所有方法。
StringBuilder：自己重写了方法之后，全都在方法内super.function()，未做任何扩展。同时从类名语义上来说String构建者，所以没有subString方法看来也合情合理；
StringBuffer：在重写方法的同时，几乎所有方法都添加了synchronized同步关键字；

常量与变量解释

String类是依赖一个私有字符常量表实现的；

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

View Code

StringBuffer与StringBuilder都是继承AbstractStringBuilder，然而AbstractStringBuilder类是依赖一个字符变量表实现的；

abstract class AbstractStringBuilder implements Appendable, CharSequence {
    /**
     * The value is used for character storage.
     */
    char[] value;

View Code

线程安全分析

为什么String是线程安全的？
首先，String是依赖字符常量表实现的；
其次，所有对String发生修改的方法返回值都是一个新的String对象，没有修改原有对象；
示例：

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

View Code

为什么实现了以上提到的两点就是线程安全的呢？

以StringBuilder类append方法为示例，第19行将需要添加的value，通过arraycopy方法复制到dst中。

 AbstractStringBuilder append(AbstractStringBuilder asb) {
        if (asb == null)
            return appendNull();
        int len = asb.length();
        ensureCapacityInternal(count + len);
        asb.getChars(0, len, value, count);//value为char [] value，StringBuilder依赖字符变量表实现
        count += len;
        return this;
    }
    public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
    {
        if (srcBegin < 0)
            throw new StringIndexOutOfBoundsException(srcBegin);
        if ((srcEnd < 0) || (srcEnd > count))
            throw new StringIndexOutOfBoundsException(srcEnd);
        if (srcBegin > srcEnd)
            throw new StringIndexOutOfBoundsException("srcBegin > srcEnd");
        System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
    }

View Code

场景假设：

假设有A、B两个线程，StringBuilder初始值为"1"；
A线程：执行append("2")；
B线程：执行append("3")；

过程分析：
CPU在执行了部分A线程的逻辑，刚好执行到第19行，此时B线程已经执行完毕；
导致A线程开始执行append("2")时，StringBuilder为"1"；
执行到一半StringBuilder变成了"13"；
最后结果得到为"132"；

过程图示：

哎，感觉没能选择一个较好的例子解释这个问题。肯定会有一部分同学懂这部分原理的觉得讲得太浅，不懂的同学可能依然不明所以。在之后的篇幅中，会仔细讲述线程安全这块内容。

性能分析

常常来说在大家的印象中，String做字符串连接是比较低效的行为。甚至在很多性能优化的经典中，都提到过切莫在迭代中使用字符串拼接操作。
这是为什么呢？
在人们通常的认识中String为常量，对常量做更改时必然需要重新开辟内存空间，以容纳新生成的String内容。如果在迭代场景中使用字符串拼接操作，那么就会大量无谓的开辟内存空间，然后在生成新的String对象后，又释放已丢失引用的String对象。

但是事实真是如此么？

测试代码：

import java.util.function.Supplier;
/**
 * @auth snifferhu
 * @date 16/9/24 18:50
 */
public class StrTest {
    private final static int TIMES = 30000;// 测试循环次数
    private static Supplier<CharSequence> sigleStringAppend = () -> {
        String tmp = "a" + "b" + "c";
        return tmp;
    };
    private static Supplier<CharSequence> stringAppend = () -> {
        String tmp = "1";
        for (int i = 0; i < TIMES; i++) {
            tmp+= "add";
        }
        return tmp;
    };
    private static Supplier<CharSequence> stringBufferAppend = () -> {
        StringBuffer tmp = new StringBuffer("1");
        for (int i = 0; i < TIMES; i++) {
            tmp.append("add");
        }
        return tmp;
    };
    private static Supplier<CharSequence> stringBuilderAppend = () -> {
        StringBuilder tmp = new StringBuilder("1");
        for (int i = 0; i < TIMES; i++) {
            tmp.append("add");
        }
        return tmp;
    };
    public static void main(String[] args) {
        timerWarpper(sigleStringAppend);
        timerWarpper(stringAppend);
        timerWarpper(stringBufferAppend);
        timerWarpper(stringBuilderAppend);
    }
    public static void timerWarpper(Supplier<CharSequence> supplier){
        Long start = System.currentTimeMillis();
        supplier.get();
        System.out.println(String.format("function [%s] time cost is %s" , 
                supplier.getClass().getCanonicalName() , 
                (System.currentTimeMillis() - start)));
    }
}

View Code

运行结果：

function [com.string.StrTest$$Lambda$1/1198108795] time cost is 0
function [com.string.StrTest$$Lambda$2/1706234378] time cost is 2339
function [com.string.StrTest$$Lambda$3/1867750575] time cost is 1
function [com.string.StrTest$$Lambda$4/2046562095] time cost is 1

从结果看来简单的String拼接在1毫秒内完成，StringBuffer与StringBuilder耗时为1，String类在迭代拼接操作中消耗了极长的时间为2339毫秒。
能够得出结论：迭代中使用字符串拼接操作确实是极为消耗时间的操作。

hashCode

String类中将hashCode缓存放在了私有变量hash，算是一种提升性能的手段，因为String本身是常量不会改变，也不担心hashCode会出错。

    /** Cache the hash code for the string */
    private int hash; // Default to 0
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;
            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

View Code

StringBuffer与StringBuilder类并未重写hashCode方法；

equals

String类先利用"=="比较内存地址，再判断是否属于String类型，最后逐一比较每一个字节内容；

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

View Code

StringBuffer与StringBuilder类并未重写equals方法；

toString

在toString方法实现上，它们各有千秋。String类直接返回自己。

 /**
     * This object (which is already a string!) is itself returned.
     *
     * @return  the string itself.
     */
    public String toString() {
        return this;
    }

View Code

StringBuffer类为了保障线程安全，添加了同步关键字；

同时为了提升性能利用私有变量缓存内容，并且本地缓存不能被序列化；
在每次修改StringBuffer时，都会将toStringCache置空。

/**
     * A cache of the last value returned by toString. Cleared
     * whenever the StringBuffer is modified.
     */
    private transient char[] toStringCache;
    @Override
    public synchronized String toString() {
        if (toStringCache == null) {
            toStringCache = Arrays.copyOfRange(value, 0, count);
        }
        return new String(toStringCache, true);
    }

View Code

valueOf

为什么可以挑出这个方法讲述呢？
这是个静态方法，对于很多类来说都有toString方法，亦能达到类似的效果；
在此做了一个容错处理，判断是否为null，保障不会报错；

    public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }

View Code

在StringBuffer类、StringBuilder类中，没有valueOf方法，不过在insert方法中调用到了valueOf；
在这是有坑点的，当传入的值为null时，它结果给我插入了"null"。大家伙切记。

    public synchronized StringBuffer insert(int offset, Object obj) {
        toStringCache = null;
        super.insert(offset, String.valueOf(obj));
        return this;
    }

View Code

subString

StringBuffer、StringBuilder类依然是继承AbstractStringBuilder类实现，StringBuffer略有不同则是添加了同步关键字；值得细细品味的是异常处理，明确的语义能够让人准确定位问题。

public String substring(int start, int end) {
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (end > count)
            throw new StringIndexOutOfBoundsException(end);
        if (start > end)
            throw new StringIndexOutOfBoundsException(end - start);
        return new String(value, start, end - start);
    }

View Code

相对而言String类的实现，在最后抛出新对象时，做了判断确定是否需要真的新生成对象，值得可取的性能优化点；
同时因为返回类型为String，AbstractStringBuilder类没法学String一样抛出this；
说来说去都需要新生成String对象所以就省去了这个判断。

   public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

View Code

replace

String类实现replace方法，先判断新旧是否一致提升效率，棒棒哒！
while循环查找第一个与oldChar相同的表地址；
为了提升性能做了本地缓存buf，同时因为value本身是常量也不用怕修改过程中被篡改了。

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

View Code

StringBuffer、StringBuilder对应的方法入参和出参都与String不同；
在校验完长度之后，就调用ensureCapacityInternal做表扩展；
利用System.arraycopy的时候，因为StringBuilder没做同步，会有arraycopy执行的同时value被篡改，导致长度不合适的情况；

    public AbstractStringBuilder replace(int start, int end, String str) {
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (start > count)
            throw new StringIndexOutOfBoundsException("start > length()");
        if (start > end)
            throw new StringIndexOutOfBoundsException("start > end");
        if (end > count)
            end = count;
        int len = str.length();
        int newCount = count + len - (end - start);
        ensureCapacityInternal(newCount);
        System.arraycopy(value, end, value, start + len, count - end);
        str.getChars(value, start);
        count = newCount;
        return this;
    }
    /**
     * This method has the same contract as ensureCapacity, but is
     * never synchronized.
     */
    private void ensureCapacityInternal(int minimumCapacity) {
        // overflow-conscious code
        if (minimumCapacity - value.length > 0)
            expandCapacity(minimumCapacity);
    }
    /**
     * This implements the expansion semantics of ensureCapacity with no
     * size check or synchronization.
     */
    void expandCapacity(int minimumCapacity) {
        int newCapacity = value.length * 2 + 2;
        if (newCapacity - minimumCapacity < 0)
            newCapacity = minimumCapacity;
        if (newCapacity < 0) {
            if (minimumCapacity < 0) // overflow
                throw new OutOfMemoryError();
            newCapacity = Integer.MAX_VALUE;
        }
        value = Arrays.copyOf(value, newCapacity);
    }

View Code

trim

String类在实现trim巧妙的地方在于用char直接做小于等于的比较，经过验证他们底层会转化为int类型，然后比较的是他们的ascii码。

   public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */
        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

View Code

查看全文

相关阅读:
MySQL安装失败，提示需安装MicroSoft Visual C++ 2013 Redistributable
Selinium登录系统cookies的重复使用
 脚本绕开验证码，自动执行的方法
 Firebug显示停用状态
 web自动化测试中绕开验证码登陆的方式
 java使用poi包将数据写入Excel表格
 读取config配置
 定位元素的等待方法
 jxl读取Excel表格数据
 php中的魔术常量

原文地址：https://www.cnblogs.com/snifferhu/p/5903958.html