最近在分析一潜在内存泄露问题的时候,jmap出来中有很多的FastThreadLocalThread实例,看了下javadoc,如下:
A special variant of ThreadLocal
that yields higher access performance when accessed from a FastThreadLocalThread
.
Internally, a FastThreadLocal
uses a constant index in an array, instead of using hash code and hash table, to look for a variable. Although seemingly very subtle, it yields slight performance advantage over using a hash table, and it is useful when accessed frequently.
To take advantage of this thread-local variable, your thread must be a FastThreadLocalThread
or its subtype. By default, all threads created by DefaultThreadFactory
are FastThreadLocalThread
due to this reason.
Note that the fast path is only possible on threads that extend FastThreadLocalThread
, because it requires a special field to store the necessary state. An access by any other kind of thread falls back to a regular ThreadLocal
.
简单地说,就是在FastThreadLocalThread线程内访问性能会更快的
ThreadLocal的一种实现。其使用常量索引而非hash值作为索引进行变量查找。
对于使用默认线程池的情况,netty会使用DefaultTrheadFactory创建FastThreadLocalThread线程,而非原生的Thread,其源码位置如下:
根据之前对比java测试c++各种map、unordered_map的记忆,一般来说map中值越多、各种实现的差距越大(因为潜在的冲突增加以及底层的实现为b*或者链表或者线性等)。
为了大概了解下差距会有多少,搜了下,有个帖子(https://my.oschina.net/andylucc/blog/614359)进行了测试,例子中结果如下:
1000个ThreadLocal对应一个线程对象的100w次的计时读操作:
ThreadLocal:3767ms | 3636ms | 3595ms | 3610ms | 3719ms
FastThreadLocal: 15ms | 14ms | 13ms | 14ms | 14ms
1000个ThreadLocal对应一个线程对象的10w次的计时读操作:
ThreadLocal:384ms | 378ms | 366ms | 647ms | 372ms
FastThreadLocal:14ms | 13ms | 13ms | 17ms | 13ms
1000个ThreadLocal对应一个线程对象的1w次的计时读操作:
ThreadLocal:43ms | 42ms | 42ms | 56ms | 45ms
FastThreadLocal:15ms | 13ms | 11ms | 15ms | 11ms
100个ThreadLocal对应一个线程对象的1w次的计时读操作:
ThreadLocal:16ms | 21ms | 18ms | 16ms | 18ms
FastThreadLocal:15ms | 15ms | 15ms | 17ms | 18ms
上面的实验数据可以看出,当ThreadLocal数量和读写ThreadLocal的频率较高的时候,传统的ThreadLocal的性能下降速度比较快,而Netty实现的FastThreadLocal性能比较稳定。上面实验模拟的场景不够具体,但是已经在一定程度上我们可以认为,FastThreadLocal相比传统的的ThreadLocal在高并发高负载环境下表现的比较优秀。
总结来说,根据经验,个人认为99%的应用中不会使用超过成千上万个线程本地变量,所以除非极为特殊的应用,出于后续维护成本的考虑,使用传统的ThreadLocal就可以了,没必要使用FastThreadLocal。
PS:关于threadlocal的场景,就不重复阐述了,可参考下列两个帖子:
https://my.oschina.net/clopopo/blog/149368
http://blog.csdn.net/lufeng20/article/details/24314381
http://lavasoft.blog.51cto.com/62575/51926/