zoukankan      html  css  js  c++  java
  • 「译」JVM是如何使用那些你从未听过的x86魔幻指令实现String.compareTo的




    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;
        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
        return len1 - len2;


    # {method} 'compare' '(Ljava/lang/String;Ljava/lang/String;)I' in 'Test'
    # parm0:    rsi:rsi   = 'java/lang/String'
    # parm1:    rdx:rdx   = 'java/lang/String'
    #           [sp+0x20]  (sp of caller)
    7fe3ed1159a0: mov    %eax,-0x14000(%rsp)
    7fe3ed1159a7: push   %rbp
    7fe3ed1159a8: sub    $0x10,%rsp        
    7fe3ed1159ac: mov    0x10(%rsi),%rdi  
    7fe3ed1159b0: mov    0x10(%rdx),%r10
    7fe3ed1159b4: mov    %r10,%rsi
    7fe3ed1159b7: add    $0x18,%rsi
    7fe3ed1159bb: mov    0x10(%r10),%edx
    7fe3ed1159bf: mov    0x10(%rdi),%ecx
    7fe3ed1159c2: add    $0x18,%rdi
    7fe3ed1159c6: mov    %ecx,%eax
    7fe3ed1159c8: sub    %edx,%ecx
    7fe3ed1159ca: push   %rcx
    7fe3ed1159cb: cmovle %eax,%edx
    7fe3ed1159ce: test   %edx,%edx
    7fe3ed1159d0: je     0x00007fe3ed115a6f
    7fe3ed1159d6: movzwl (%rdi),%eax
    7fe3ed1159d9: movzwl (%rsi),%ecx
    7fe3ed1159dc: sub    %ecx,%eax
    7fe3ed1159de: jne    0x00007fe3ed115a72
    7fe3ed1159e4: cmp    $0x1,%edx
    7fe3ed1159e7: je     0x00007fe3ed115a6f
    7fe3ed1159ed: cmp    %rsi,%rdi
    7fe3ed1159f0: je     0x00007fe3ed115a6f
    7fe3ed1159f6: mov    %edx,%eax
    7fe3ed1159f8: and    $0xfffffff8,%edx
    7fe3ed1159fb: je     0x00007fe3ed115a4f
    7fe3ed1159fd: lea    (%rdi,%rax,2),%rdi
    7fe3ed115a01: lea    (%rsi,%rax,2),%rsi
    7fe3ed115a05: neg    %rax
    7fe3ed115a08: vmovdqu (%rdi,%rax,2),%xmm0
    7fe3ed115a0d: vpcmpestri $0x19,(%rsi,%rax,2),%xmm0
    7fe3ed115a14: jb     0x00007fe3ed115a40
    7fe3ed115a16: add    $0x8,%rax
    7fe3ed115a1a: sub    $0x8,%rdx
    7fe3ed115a1e: jne    0x00007fe3ed115a08
    7fe3ed115a20: test   %rax,%rax
    7fe3ed115a23: je     0x00007fe3ed115a6f
    7fe3ed115a25: mov    $0x8,%edx
    7fe3ed115a2a: mov    $0x8,%eax
    7fe3ed115a2f: neg    %rax
    7fe3ed115a32: vmovdqu (%rdi,%rax,2),%xmm0
    7fe3ed115a37: vpcmpestri $0x19,(%rsi,%rax,2),%xmm0
    7fe3ed115a3e: jae    0x00007fe3ed115a6f
    7fe3ed115a40: add    %rax,%rcx
    7fe3ed115a43: movzwl (%rdi,%rcx,2),%eax
    7fe3ed115a47: movzwl (%rsi,%rcx,2),%edx
    7fe3ed115a4b: sub    %edx,%eax
    7fe3ed115a4d: jmp    0x00007fe3ed115a72
    7fe3ed115a4f: mov    %eax,%edx
    7fe3ed115a51: lea    (%rdi,%rdx,2),%rdi
    7fe3ed115a55: lea    (%rsi,%rdx,2),%rsi
    7fe3ed115a59: dec    %edx
    7fe3ed115a5b: neg    %rdx
    7fe3ed115a5e: movzwl (%rdi,%rdx,2),%eax
    7fe3ed115a62: movzwl (%rsi,%rdx,2),%ecx
    7fe3ed115a66: sub    %ecx,%eax
    7fe3ed115a68: jne    0x00007fe3ed115a72
    7fe3ed115a6a: inc    %rdx
    7fe3ed115a6d: jne    0x00007fe3ed115a5e
    7fe3ed115a6f: pop    %rax
    7fe3ed115a70: jmp    0x00007fe3ed115a73
    7fe3ed115a72: pop    %rcx
    7fe3ed115a73: add    $0x10,%rsp
    7fe3ed115a77: pop    %rbp
    7fe3ed115a78: test   %eax,0x17ed6582(%rip)
    7fe3ed115a7e: retq



    pcmpestri是SSE4.2中引入的指令,属于pcmpxstrx向量化字符串比较指令家族。它通过一个控制字节(Control byte)复杂的功能,由于它们很复杂,x86指令集手册专门用一个小节来描述它,为了易于理解甚至还提供了一个flow图



    -------0b 128-bit sources treated as 16 packed bytes.
    -------1b 128-bit sources treated as 8 packed words.
    ------0-b Packed bytes/words are unsigned.
    ------1-b Packed bytes/words are signed.
    ----00--b Mode is equal any.
    ----01--b Mode is ranges.
    ----10--b Mode is equal each.
    ----11--b Mode is equal ordered.
    ---0----b IntRes1 is unmodified.
    ---1----b IntRes1 is negated (1’s complement).
    --0-----b Negation of IntRes1 is for all 16 (8) bits.
    --1-----b Negation of IntRes1 is masked by reg/mem validity.
    -0------b Index of the least significant, set, bit is used
              (regardless of corresponding input element validity).
              IntRes2 is returned in least significant bits of XMM0.
    -1------b Index of the most significant, set, bit is used
              (regardless of corresponding input element validity).
              Each bit of IntRes2 is expanded to byte/word.
    0-------b This bit currently has no defined effect, should be 0.
    1-------b This bit currently has no defined effect, should be 0.

    (如果想要深入了解,可以参见Intel Instruction Set Reference Section 4.1)
    compareTo使用0x19(译注:'0b11001'),即对每8个packed words使用equal each模式(逐个相等比较)比较,结果取反。这个怪物指令使用4个寄存器作为输入:两个字符串作为参数,加上%rax%rdx指定它们的长度( PCMPESTRI中的E表示显示指定长度——与之相对的pcmpistri和pcmpistrm表示用null作为结尾符,即不显示指定长度)。结果(IntRes2)会放到%ecx。有时候这些不够的情况下pcmpxstrx家族的指令还会设置一些flag:

    CFlag – Reset if IntRes2 is equal to zero, set otherwise
    ZFlag – Set if absolute-value of EDX is < 16 (8), reset otherwise
    SFlag – Set if absolute-value of EAX is < 16 (8), reset otherwise
    OFlag – IntRes2[0]
    AFlag – Reset
    PFlag – Reset


    7fe3ed1159f6: mov    %edx,%eax
    7fe3ed1159f8: and    $0xfffffff8,%edx
    7fe3ed1159fd: lea    (%rdi,%rax,2),%rdi
    7fe3ed115a01: lea    (%rsi,%rax,2),%rsi
    7fe3ed115a05: neg    %rax
    7fe3ed115a08: vmovdqu (%rdi,%rax,2),%xmm0
    7fe3ed115a0d: vpcmpestri $0x19,(%rsi,%rax,2),%xmm0
    7fe3ed115a14: jb     0x00007fe3ed115a40
    7fe3ed115a16: add    $0x8,%rax
    7fe3ed115a1a: sub    $0x8,%rdx
    7fe3ed115a1e: jne    0x00007fe3ed115a08

    %rax是较短字符串长度,%rdx~0x7求与 (即最大循环次数的8倍)。然后它会比较指向两个字符串数组(%rsi%rdi)的指针,由于循环前对%rax取反,所以循环实际上是反向进行的。



    如果上面对你来说不是很复杂,那么可以看看更魔幻的indexOf实现(有两个版本,取决于待匹配字符串的长度),它使用控制字节0x0d,即equal ordered模式进行匹配。

  • 相关阅读:
    leaflet antvPath示例
    java stream流中的collect()方法详解
    Stream使用Collector.tomap方法value值为null时报空指针异常 解决方案
    一口气说出 4 种分布式一致性 Session 实现方式,面试杠杠的~
    linux重定向及/dev/null 2>&1详解
    Linux文件目录变只读(Read-only file system)导致mysql启动失败
  • 原文地址:https://www.cnblogs.com/kelthuzadx/p/12837642.html
Copyright © 2011-2022 走看看