zoukankan      html  css  js  c++  java
  • Half-Precision Floating Point

    Half-Precision Floating Point

    On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating point via the __fp16 type defined in the ARM C Language Extensions. On ARM systems, you must enable this type explicitly with the -mfp16-format command-line option in order to use it.

    ARM targets support two incompatible representations for half-precision floating-point values. You must choose one of the representations and use it consistently in your program.

    Specifying -mfp16-format=ieee selects the IEEE 754-2008 format. This format can represent normalized values in the range of 2^{-14} to 65504. There are 11 bits of significand precision, approximately 3 decimal digits.

    Specifying -mfp16-format=alternative selects the ARM alternative format. This representation is similar to the IEEE format, but does not support infinities or NaNs. Instead, the range of exponents is extended, so that this format can represent normalized values in the range of 2^{-14} to 131008.

    The GCC port for AArch64 only supports the IEEE 754-2008 format, and does not require use of the -mfp16-format command-line option.

    The __fp16 type may only be used as an argument to intrinsics defined in <arm_fp16.h>, or as a storage format. For purposes of arithmetic and other operations, __fp16 values in C or C++ expressions are automatically promoted to float.

    The ARM target provides hardware support for conversions between __fp16 and float values as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides hardware support for conversions between __fp16 and double values. GCC generates code using these hardware instructions if you compile with options to select an FPU that provides them; for example, -mfpu=neon-fp16 -mfloat-abi=softfp, in addition to the -mfp16-format option to select a half-precision format.

    Language-level support for the __fp16 data type is independent of whether GCC generates code using hardware floating-point instructions. In cases where hardware support is not specified, GCC implements conversions between __fp16 and other types as library calls.

    It is recommended that portable code use the _Float16 type defined by ISO/IEC TS 18661-3:2015. See Floating Types.

    https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html

  • 相关阅读:
    16进制节码解析
    批注:modbus_tkdefines.py
    <20211019> Win10不明原因丢失任务提示栏里的Wifi以及网络任务提示栏logo
    <20210926>log: 运行5年3个月的NAS硬盘更换
    Huggingface中的BERT模型的使用方法
    list变量和dict变量前面加*号
    Linux服务器登录阿里网盘下载和上传文件的方法
    【IDEA与git集成】
    【为什么要用 @param注解】
    【我的编程习惯与开发插件】
  • 原文地址:https://www.cnblogs.com/cloudrivers/p/14621984.html
Copyright © 2011-2022 走看看