zoukankan      html  css  js  c++  java
  • CPU指令集不同导致的core分析

    最近程序需要支持CGSL系统运行,测试中发现相同操作系统的两台机器,编译机运行正常,测试机coredump。core信息汇总如下,可以看出是由于测试机不支持编译后的指令导致的问题:

    Program terminated with signal 4, Illegal instruction.
    
       0x00007fad269ac973 <+435>:   add    $0x4,%rdx
       0x00007fad269ac977 <+439>:   lea    -0x1010101(%rcx),%eax
    => 0x00007fad269ac97d <+445>:   andn   %eax,%ecx,%eax
       0x00007fad269ac982 <+450>:   and    $0x80808080,%eax
    
    
       0x00007f26c8b37e87 <+181>:   lea    0x28(%rax),%rsi
       0x00007f26c8b37e8b <+185>:   mov    -0x38(%rbp),%rax
    => 0x00007f26c8b37e8f <+189>:   vmovdqa -0x60(%rbp),%xmm0
       0x00007f26c8b37e94 <+194>:   vmovdqu %xmm0,0x10(%rsp)
       0x00007f26c8b37e9a <+200>:   movl   $0x0,0x8(%rsp)
       
       0x00007f2f5d5c282e <+216>:   js     0x7f2f5d5c2837
    => 0x00007f2f5d5c2830 <+218>:   vcvtsi2ss %rax,%xmm0,%xmm0
       0x00007f2f5d5c2835 <+223>:   jmp    0x7f2f5d5c284c
       0x00007f2f5d5c2837 <+225>:   mov    %rax,%rdx
    

    两台机器操作系统完全一致,内核和gcc版本如下:

    [CGSLv5]# uname -a
    Linux CGSLv5-2965 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Mar 30 15:43:35 CST 2018 x86_64 x86_64 x86_64 GNU/Linux
    [CGSLv5]# gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-ZTEOS-linux/4.8.5/lto-wrapper
    Target: x86_64-ZTEOS-linux
    Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-ZTEOS-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-ZTEOS-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-ZTEOS-linux
    Thread model: posix
    gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
    [CGSLv5]#
    

      

    通过<<Intel® 64 and IA-32 Architectures Software Developer’s Manual>>查询汇编指令andn/vmovdqa/vcvtsi2ss归属的指令集, 可以看到这三个指令分别归属与BMI1/AVX

    查询编译机和测试机的CPU信息和flags如下,确认测试机不支持BMI1/AVX/AVX2(也可以参考Intel对E5-2680和E5260的介绍):

    编译机器:
    [compiler@CGSLV5]# cat /proc/cpuinfo | grep "model name" | uniq -c
         56 model name      : Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
    
    
    测试机器:
    [compiler@CGSLV5]# cat /proc/cpuinfo | grep "model name" | uniq -c
         16 model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
       
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb tpr_shadow vnmi flexpriority ept vpid dtherm arat
    

    针对指令集的问题,GCC可以通过设置编译选项-march/-mtune来处理,具体参考<<Intel 386 and AMD x86-64 Options>>,截取一部分如下:

    -march=cpu-type
      Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code for the specified cpu-type, -march=cpu-type allows GCC to generate code that may not run at all on processors other than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.
    
    The choices for cpu-type are:
    ‘native’
      This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using -mtune=native produces code optimized for the local machine under the constraints of the selected instruction set. 
    ‘core2’
      Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support. 
    ‘corei7’
      Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support. 
    ‘corei7-avx’
      Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support. 
    ‘core-avx2’
      Intel Core CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C instruction set support. 

    一般来说编译时设置了-march=native,才会出现本地机器指令集都支持,但很有可能导致编译后的程序不能在其它机器上运行的结果。查询编译选项发现-march=native,删除后通过objdump反汇编编译后程序确认不再生成BMI/AVX指令,测试机器验证运行正常。

    修改前:
    [CGSLv5]# objdump -d libTest.so  | grep vcvtsi2ss
       63830:       c4 e1 fa 2a c0          vcvtsi2ss %rax,%xmm0,%xmm0
       63843:       c4 e1 fa 2a c2          vcvtsi2ss %rdx,%xmm0,%xmm0
       640a6:       c4 e1 fa 2a c0          vcvtsi2ss %rax,%xmm0,%xmm0
       640b9:       c4 e1 fa 2a c1          vcvtsi2ss %rcx,%xmm0,%xmm0
       640d3:       c4 e1 f2 2a ca          vcvtsi2ss %rdx,%xmm1,%xmm1
       640e6:       c4 e1 f2 2a c8          vcvtsi2ss %rax,%xmm1,%xmm1
       64123:       c4 e1 f2 2a ce          vcvtsi2ss %rsi,%xmm1,%xmm1
       64136:       c4 e1 f2 2a cf          vcvtsi2ss %rdi,%xmm1,%xmm1
       64172:       c4 e1 f2 2a ce          vcvtsi2ss %rsi,%xmm1,%xmm1
       64185:       c4 e1 f2 2a cf          vcvtsi2ss %rdi,%xmm1,%xmm1
       641a2:       c4 e1 fa 2a c0          vcvtsi2ss %rax,%xmm0,%xmm0
       641b5:       c4 e1 fa 2a c2          vcvtsi2ss %rdx,%xmm0,%xmm0
    [CGSLv5]#
    
    修改后: 
    [CGSLv5]# objdump -d libTest.so  | grep vcvtsi2ss
    [CGSLv5]#

    当然,也可以直接指定选项-march的值,像我的机器查询出来有以下几种cpu-type,可以通过指定 -march=corei7 来屏蔽BMI/AVX/AVX2指令集:

    [localhost]# gcc -c -Q -march=native --help=target  | grep -E "avx|arch"
      -march=                               core-avx2
      -mavx                                 [enabled]
      -mavx2                                [enabled]
      -mavx256-split-unaligned-load         [disabled]
      -mavx256-split-unaligned-store        [disabled]
      -mprefer-avx128                       [disabled]
      -msse2avx                             [disabled]
    
      -march=                               corei7
      -mavx                                 [disabled]
      -mavx2                                [disabled]
      -mavx256-split-unaligned-load         [disabled]
      -mavx256-split-unaligned-store        [disabled]
      -mprefer-avx128                       [disabled]
      -msse2avx                             [disabled]
    
      -march=                               corei7-avx
      -mavx                                 [enabled]
      -mavx2                                [disabled]
      -mavx256-split-unaligned-load         [disabled]
      -mavx256-split-unaligned-store        [disabled]
      -mprefer-avx128                       [disabled]
      -msse2avx                             [disabled]
      -mtune=                               corei7-avx

    几个问题:

    1. 如果GCC编译选项没有设置-march, 那么默认这个选项是什么?
    2. 理论来讲,设置-march=native可以使用本机器支持的全部指令集,从而得到更优的性能,但降低了程序对不同机器的兼容性;不设置-march=native或设置为其它选项虽然理论上降低了程序性能,但程序兼容性大大增加;如何取舍?
    3. docker是Build once,run anywhere, 那对于这种情况, 同一个docker程序镜像可以在这两台机器上运行吗?

    可以参考的网址:

    Intel® Xeon® Processor E5620(Intel® SSE4.2):
    https://ark.intel.com/products/47925/Intel-Xeon-Processor-E5620-12M-Cache-2-40-GHz-5-86-GT-s-Intel-QPI-
    
    Intel® Xeon® Processor E5-2680(Intel® AVX):
    https://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2-70-GHz-8-00-GT-s-Intel-QPI-
    
    gcc-online-docs:
    https://gcc.gnu.org/onlinedocs/
    
    i386-x86-64-option of gcc4.8.5:
    https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Option-Summary.html#Option-Summary
    https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
    
    avx(Advanced Vector Extensions):
    https://software.intel.com/zh-cn/articles/introduction-to-intel-advanced-vector-extensions
    https://software.intel.com/en-us/blogs/2015/01/15/vector-programming-sse42-to-avx2-conversion-examples
    https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
    
    Intel® 64 and IA-32 Architectures Software Developer’s Manual:
    https://software.intel.com/en-us/articles/intel-sdm
    
    x86/amd64 fast online instruction reference from Intel Architectures Software Developer’s Manual:
    http://www.felixcloutier.com/x86/

    Excellence, is not an act, but a habit.
    作者:子厚.
    出处:http://www.cnblogs.com/aios/
    本文版权归作者和博客园共有,欢迎转载、交流、点赞、评论,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。

  • 相关阅读:
    pycharm日常填坑
    django学习笔记一
    selenium自动化框架介绍------unittest版本
    appium使用教程(三)-------------用例编写
    appium使用教程(二)-------------连接手机
    appium使用教程(一 环境搭建)-------------2.安装部署
    appium使用教程(一 环境搭建)-------------1.准备阶段
    活着的意义
    jQuery插件的编写相关技术 设计总结和最佳实践
    精选10款HTML5手机模板
  • 原文地址:https://www.cnblogs.com/aios/p/9955339.html
Copyright © 2011-2022 走看看