DATE: 2018.11.14
1、问题描述
采用Intel编译器编译使用MMX指令(Inline assembly)的汇编代码时,报下面的警告:
warning #13200: No emms instruction before return from function
2、分析和解决方案
Analysis:
通过分析可知,这是由于在使用MMX指令的函数的最后没有指定emms指令。emms指令是用于清除浮点寄存器的状态。
参考:https://stackoverflow.com/questions/43568840/warning-c4799-function-has-no-emms-instruction/43571015
Since MMX aliases over the floating-point registers, any routine that uses MMX instructions must end with the EMMS instruction. This instruction “clears” the registers, making them available for use by the x87 FPU once again. (Which any C or C++ calling convention for x86 will assume is safe.)
The compiler is warning you that you have written a routine that uses MMX instructions but does not end with the EMMS instruction. That’s a bug waiting to happen, as soon as some FPU instruction tries to execute.
This is a huge disadvantage of MMX, and the reason why you really can’t freely intermix MMX and floating-point instructions. Sure, you could just throw EMMS instructions around, but it is a slow, high-latency instruction, so this kills performance. SSE had the same limitations as MMX in this regard, at least for integer operations. SSE2 was the first instruction set to address this problem, since it used its own discrete register set. Its registers are also twice as wide as MMX’s are, so you can do even more at a time. Since SSE2 does everything that MMX does, but faster, easier, and more efficiently, and is supported by the Pentium 4 and later, it is quite rare that anyone needs to write new code today that uses MMX. If you can use SSE2, you should. It will be faster than MMX. Another reason not to use MMX is that it is not supported in 64-bit mode.
原因和风险: MMX,SSE都存在需要在使用完MMX,SSE指令返回函数之前使用emms指令,并且emms指令速度很慢。而SSE2没有这个限制。存在这个限制的根本原因是MMX的整数寄存器使用的浮点寄存器的低位,在使用完整数寄存器之后,需要将浮点寄存器复位,否则在后面使用浮点寄存器时,会造成计算出错。
参考:http://www.info.univ-angers.fr/pub/richer/ens/l3info/ao/intel_intrinsics.pdf
The EMMS Instruction: Why You Need It?
Using EMMS is like emptying a container to accommodate new content. The EMMS
instruction clears the MMX™ registers and sets the value of the floating-point tag
word to empty. Because floating-point convention specifies that the floating-point
stack be cleared after use, you should clear the MMX registers before issuing a
floating-point instruction. You should insert the EMMS instruction at the end of all
MMX code segments to avoid a floating-point overflow exception.
Solution:
Use the EMMS instruction (e.g. by calling the _mm_empty() intrinsic ) after the MMX instructions before the return to restore the Floating-point status on the CPU.
参考网址:
https://www.felixcloutier.com/x86/EMMS.html
https://software.intel.com/en-us/articles/cdiag963