Cortex-A9 NEON™ Media Processing Engine
Introduction
The Cortex-A9 NEON MPE extends the Cortex-A9 functionality to provide support for the ARM v7 Advanced SIMD and Vector Floating-Point v3 (VFPv3) instruction sets. The Cortex-A9 NEON MPE supports all addressing modes and data-processing operations described in the ARM Architecture Reference Manual.
The Cortex-A9 NEON MPE features are:
- SIMD and scalar single-precision floating-point computation
- scalar double-precision floating-point computation
- SIMD and scalar half-precision floating-point conversion
- 8, 16, 32, and 64-bit signed and unsigned integer SIMD computation
- 8 or 16-bit polynomial computation for single-bit coefficients
- structured data load capabilities
- dual issue with Cortex-A9 processor ARM or Thumb instructions
- independent pipelines for VFPv3 and Advanced SIMD instructions
- large, shared register file, addressable as:
— thirty-two 32-bit S (single) registers
— thirty-two 64-bit D (double) registers
— sixteen 128-bit Q (quad) registers.
The Cortex-A9 NEON MPE provides high-performance SIMD vector operations for:
- unsigned and signed integers
- single bit coefficient polynomials
- single-precision floating-point values.
The operations include:
- addition and subtraction
- multiplication with optional accumulation
- maximum or minimum value driven lane selection operations
- inverse square-root approximation
- comprehensive data-structure load instructions, including register-bank-resident table lookup.
VFPv3
The Cortex-A9 NEON MPE hardware supports single and double-precision add, subtract, multiply, divide, multiply and accumulate, and square root operations as described in the ARM VFPv3 architecture. It provides conversions between 16-bit, 32-bit and 64-bit floating-point formats and ARM integer word formats, with special operations to perform conversions in round-towards-zero mode for high-level language support.
ARMv7 deprecates the use of VFP vector mode. The Cortex-A9 NEON MPE hardware does not support VFP vector operations. In this manual, the term vector refers to Advanced SIMD integer, polynomial and single-precision vector operations. The Cortex-A9 NEON MPE provides high speed VFP operation without support code. However, if an application requires VFP vector operation, then it must use support code. See the ARM Architecture Reference Manual for information on VFP vector operation support.
此处提到的support code指的是:为VFP专有结构,对boot code(汇编代码)进行适应性的改造,以完成专有指令以及异常的处理。具体可以参考VFP Support Code
Supported formats
Table 2-1 shows the formats supported for each of the Advanced SIMD and VFPv3 instruction sets implemented by the Cortex-A9 NEON MPE. All signed integers are two's complement representations.
Writing optimal VFP and Advanced SIMD code
The following guidelines can provide significant performance increases for VFP and Advanced SIMD code:
Where possible avoid:
- unnecessary accesses to the VFP control registers
- transferring values between the Cortex-A9 core registers and VFP or Advanced SIMD register file, see the ARM Architecture Reference Manual for definition of core registers
- register dependencies between neighboring instructions
- mixing Advanced SIMD only instructions with VFP only instructions.
Be aware that:
- with the exception of simultaneous loads and stores, the processor can execute VFP and Advanced SIMD instructions in parallel with ARM or Thumb instructions
- using Advanced SIMD value selection operations is more efficient than using the equivalent VFP compare with conditional execution.
Instruction timing tables
内容较多,这里不列出了,具体参考《Cortex™-A9 NEON™ Media Processing
Engine Technical Reference Manual》