zoukankan      html  css  js  c++  java
  • How much faster is assembly language?

    http://www.fourtheye.org/armstrong.shtml


    How much faster is assembly language?

    On reading about the philosophy behind the Raspberry Pi and the emphasis on teaching programming I looked for a book I have called Problems For Computer Solution which I have used on occasion to learn. I was also asked, when talking about the ARM processor in my SheevaPlug at my local Linux User Group, how much faster code was when written in assembly language.

    As an experiment, I chose to code the seventh problem, to locate all of the Armstrong numbers of 2, 3 or 4 digits.

    I coded the different versions in the following order:

    1. Perl - short and clear (armstrong4.pl).
    2. C - using sprintf into a string to separate the digits. A little more involved (armstrong4string.c).
    3. Assembly language - I sketched a flow chart and then coded it (armstrong.s).
    4. Assembly language with a macro - I realised that I was repeating code in the previous version so abstracted it to a macro (armstrong4macro.s).
    5. A version in C which uses division to separate the digits and follows a similar algorithm to the assembly language version (armstrong4divide.c).

    The code is listed in the appendix. See also

    Timing

    Here is the cpuinfo for the machine.

    bob@poland:~/src/problems_for_computer_solution/07_armstrong_numbers$ cat /proc/cpuinfo 
    Processor	: Feroceon 88FR131 rev 1 (v5l)
    BogoMIPS	: 1192.75
    Features	: swp half thumb fastmult edsp 
    CPU implementer	: 0x56
    CPU architecture: 5TE
    CPU variant	: 0x2
    CPU part	: 0x131
    CPU revision	: 1
    
    Hardware	: Marvell SheevaPlug Reference Board
    Revision	: 0000
    Serial		: 0000000000000000
    bob@poland:~/src/problems_for_computer_solution/07_armstrong_numbers$
    

    I extended the search space to 5 and 6 digits to allow for longer runtimes.

    Maximum number
    of digits
    Perl C - string C - divide Assembly code
    4
    time perl armstrong4.pl
    real 0m0.583s
    user 0m0.580s
    sys 0m0.000s
    time ./armstrong4divide
    real 0m0.256s
    user 0m0.260s
    sys 0m0.000s
    time ./armstrong4string
    real 0m0.267s
    user 0m0.270s
    sys 0m0.000s
    time ./armstrong4macro
    real 0m0.007s
    user 0m0.020s
    sys 0m0.000s
    5
    time perl armstrong5.pl
    real 0m6.202s
    user 0m6.180s
    sys 0m0.020s
    time ./armstrong5string
    real 0m3.302s
    user 0m3.300s
    sys 0m0.000s
    time ./armstrong5divide
    real 0m3.198s
    user 0m3.200s
    sys 0m0.000s
    time ./armstrong5macro
    real 0m0.044s
    user 0m0.060s
    sys 0m0.000s
    6
    time perl armstrong6.pl
    real 1m10.881s
    user 1m10.650s
    sys 0m0.010s
    time ./armstrong6string
    real 0m39.312s
    user 0m39.200s
    sys 0m0.000s
    time ./armstrong6divide
    real 0m40.903s
    user 0m38.230s
    sys 0m0.000s
    time ./armstrong6macro
    real 0m0.512s
    user 0m0.510s
    sys 0m0.000s

    The assembly language is the first draft, apart from the abstraction of the macro. It could probably be further optimised to shave a few cycles if performance were important. The ARM is a RISC processor and the version I have in the SheevaPlug (5TE) has no divide instruction (though I think that ARMv7 does?). Division can be achieved via repeated subtraction and counting which is the approach followed here.

    Engineering is often a tradeoff between different constraints - here coding time and run time. If the code is to be run once - or once a day, then it makes sense to write it in Perl (or some other high-level language); if, however, it is to be run a million times per day then it makes sense to invest the time to make it run efficiently.

    I documented some preliminary investigations into assembly language programming on the ARM here.

    I have just been reading about THUMB mode - which allows the 32 bit processor to run 16 bit instructions. There are, however, restrictions on what is permissible in this mode, and I am not convinced of the benefits of having smaller instruction (quicker to load and execute?). However, I was curious to see if the switch (.thumb) would work, and if it ran faster. It may, but requires investigation which I may do?

    I have no experience of teaching, so if anyone has any ideas as to how I could improve this page, or the code, please email me.

    Arnaud tested the code on his Nokia 900 phone, which is an ARMV7 with approx 250 BogoMIPS (c.f. the SheevaPlug with approx 1000 BogoMIPS). The relative performances of Perl, C and assembly language were similar to those seen on the SheevaPlug.

    Appendix - The code

    1. Perl version
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    foreach my $number (10 .. 9999) {
      my $size = length $number;
      my @digits = split(//, $number);
      my $total = 0;
      for (my $index = 0; $index < $size; $index++) {
        $total += $digits[$index] ** $size;
      }
      print "ARMSTRONG NUMBER is $number
    " if ($total == $number);
    }
    
    
    C versions

    N.B. These are functionally equivalent.

    • First version using a string
    #include "stdio.h"
    #include "math.h"
    #include "stdlib.h"
    
    /* we allocate sufficient space to store the widest integer */
    #define MAXWIDTH 4
    
    /* numeric string characters are offset from their value */
    #define NUMOFFSET 48
    
    int main()
    {
      int number;
      for (number=10; number < 10000; number++)
      {
        char string[MAXWIDTH+1] = {};
        snprintf(string, MAXWIDTH+1, "%d", number);
        int numlen = strnlen(string, MAXWIDTH);
         
        int total = 0;
        int j;
        for (j=0; j < numlen; j++)
        {
          int digit = string[j] - NUMOFFSET;
          total += pow(digit, numlen);
        }
        if (total == number)
          printf("ARMSTRONG NUMBER is %d
    ", total);
      }
      exit(0);
    }
    
    
    Second version using division
    #include "stdio.h"
    #include "stdint.h"
    #include "stdlib.h"
    #include "math.h"
    
    /* work on base 10 */
    #define BASE 10
    
    int main()
    {
      uint8_t numlen = 2;
      uint16_t number;
      for (number=10; number < 10000; number++)
      {
        if (number >= 1000)
          numlen = 4;
        else if (number >= 100)
          numlen = 3;
    
        uint32_t counter = number;
        uint8_t digit = counter % BASE;
        uint32_t armstrong = pow(digit, numlen);
        while (counter = (uint32_t) floor(counter / BASE))
        {
          digit = counter % BASE;
          armstrong += pow(digit, numlen);
        }
    
        if (armstrong == number)
          printf("ARMSTRONG NUMBER is %d
    ", armstrong);
      }
      exit(0);
    }
    
    
    Assembly language
    • Power function
    # this subroutine returns the passed digit to the passed power
    #
    # inputs
    #   r0 - digit
    #   r1 - power 
    #
    # outputs
    #   r0 - digit ** power
    #
    # locals
    #   r4
    .globl _power
    .align 2
            .text
    _power:
    	nop
            stmfd	sp!, {r4, lr}		@ save variables to stack
    
    	subs	r1, r1, #1		@ leave unless power > 1
    	ble	_power_end
    
    	mov	r4, r0			@ copy digit
    _power_loop_start:
    	mul	r0, r4, r0		@ raise to next power
    	subs	r1, r1, #1		
    	beq	_power_end		@ leave when done
    	b	_power_loop_start	@ next iteration
    _power_end:
            ldmfd   sp!, {r4, pc}		@ restore state from stack and leave subroutime
    
    
    Armstrong function
    # inputs
    #   r0 - number
    #
    # outputs
    #   r0 - armstrong number
    #
    # local r4, r5, r6, r7, r8
    
    .equ ten,10
    .equ hundred,100
    .equ thousand,1000
    .equ ten_thousand,10000
    
    number .req r4
    width .req r5
    digit .req r6
    current .req r7
    armstrong .req r8
    
    .globl _armstrong
    .align 2
            .text
    _armstrong:
            nop
            stmfd   sp!, {r4, r5, r6, r7, r8, lr}   @ save variables to stack
    
            mov     number, r0			@ copy passed parameter to working number
    	cmp	number, #ten			@ exit unless number > 10
    	blt	_end
    
            ldr     current, =ten_thousand		@ exit unless number < 10000
    	cmp	number, current
    	bge	_end
    
    	mov	width, #0			@ initialise
    	mov	digit, #0
    	mov	armstrong, #0
    	ldr	current, =thousand		@ handle 1000 digit
    _thousand_start:
    	cmp	number, current
    	blt	_thousand_end			@ exit thousand code if none left
    	
    	mov	width, #4			@ width must be 4
    	add	current, current, #thousand	@ bump thousand counter
    	add	digit, digit, #1		@ and corresponding digit count
    	b	_thousand_start			@ and loop
    _thousand_end:
    	add	number, number, #thousand	@ need number modulo thousand
    	sub	number, number, current
    	mov	r0, digit			@ push digit
    	mov	r1, width			@ and width
    	bl	_power				@ to compute digit **width
    	add	armstrong, r0, armstrong	@ and update armstrong number with this value
    
    	ldr	current, =hundred		@ then we do the hundreds as we did the thousands
    	mov	digit, #0
    _hundred_start:
    	cmp	number, current
    	blt	_hundred_end
    	
    	teq	width, #0			@ and only set width if it is currently unset
    	moveq	width, #3
    _hundred_
    	add	current, current, #hundred	@ yada yada as thousands above
    	add	digit, digit, #1
    	b	_hundred_start
    _hundred_end:
    	add	number, number, #hundred
    	sub	number, number, current
    	mov	r0, digit
    	mov	r1, width
    	bl	_power
    	add	armstrong, r0, armstrong
    
    	ldr	current, =ten			@ then the tens as the hundred and thousands above
    	mov	digit, #0
    _ten_start:
    	cmp	number, current
    	blt	_ten_end
    	
    	teq	width, #0
    	moveq	width, #2
    _ten_
    	add	current, current, #ten
    	add	digit, digit, #1
    	b	_ten_start
    _ten_end:
    	add	number, number, #ten
    	sub	number, number, current
    	mov	r0, digit
    	mov	r1, width
    	bl	_power
    	add	armstrong, r0, armstrong
    
    	mov	r0, number			@ then add in the trailing digits
    	mov	r1, width
    	bl	_power
    	add	armstrong, r0, armstrong
    
    	mov	r0, armstrong			@ and copy the armstrong number back to r0 for return
    _end:
            ldmfd   sp!, {r4, r5, r6, r7, r8, pc}   @ restore state from stack and leave subroutine
    
    
    Armstrong function with a macro to abstract repeated code

    N.B. This is functionally equivalent but much shorter than the previous function. The variable @ here is a magic variable, incremented each time the macro is instantiated. This enables the use of distinct labels, which we need here.

    # inputs
    #   r0 - number
    #
    # outputs
    #   r0 - armstrong number
    #
    # local r4, r5, r6, r7, r8
    
    .equ ten,10
    .equ hundred,100
    .equ thousand,1000
    .equ ten_thousand,10000
    
    number .req r4
    width .req r5
    digit .req r6
    current .req r7
    armstrong .req r8
    
    .macro armstrong_digit a, b
    	ldr	current, =a
    	mov	digit, #0
    _start@:
    	cmp	number, current
    	blt	_end@
    	
    	teq	width, #0			@ and only set width if it is currently unset
    	moveq	width, #
    	add	current, current, #a
    	add	digit, digit, #1
    	b	_start@
    _end@:
    	add	number, number, #a
    	sub	number, number, current
    	mov	r0, digit
    	mov	r1, width
    	bl	_power
    	add	armstrong, r0, armstrong
    .endm
    
    .globl _armstrong
    .align 2
            .text
    _armstrong:
            nop
            stmfd   sp!, {r4, r5, r6, r7, r8, lr}   @ save variables to stack
    
            mov     number, r0			@ copy passed parameter to working number
    	cmp	number, #ten			@ exit unless number > 10
    	blt	_end
    
            ldr     current, =ten_thousand		@ exit unless number < 10000
    	cmp	number, current
    	bge	_end
    
    	mov	width, #0			@ initialise
    	mov	armstrong, #0
    
    	armstrong_digit thousand 4
    	armstrong_digit hundred 3
    	armstrong_digit ten 2
    
    	mov	r0, number			@ then add in the trailing digits
    	mov	r1, width
    	bl	_power
    	add	armstrong, r0, armstrong
    
    	mov	r0, armstrong			@ and copy the armstrong number back to r0 for return
    _end:
            ldmfd   sp!, {r4, r5, r6, r7, r8, pc}   @ restore state from stack and leave subroutine
    
    
    Armstrong_main function
    .equ ten,10
    .equ ten_thousand,10000
    
    .section	.rodata
    	.align	2
    string:
    	.asciz "armstrong number of %d is %d
    "
    .text
    	.align	2
    	.global	main
    	.type	main, %function
    main:
    	ldr	r5, =ten
    	ldr	r6, =ten_thousand
    
    	mov	r4, r5		@ start with n = 10
    _main_loop:
    	cmp	r4, r6		@ leave if n = 10_000
    	beq	_main_end
    
    	mov	r0, r4		@ call the _armstrong function
    	bl	_armstrong
    
    	teq	r0, r4		@ if the armstong value = n print it
    	bne	_main_next		@ else skip
    
    	mov	r2, r0
    	mov	r1, r4
    	ldr	r0, =string	@ store address of start of string to r0
    	bl	printf		@ call the c function to display information
    _main_next:
    	add	r4, r4, #1
    	b	_main_loop
    _main_end:
    	mov	r7, #1		@ set r7 to 1 - the syscall for exit
    	swi	0		@ then invoke the syscall from linux
    
    
    A Makefile for the armstrong code
    AS      := /usr/bin/as
    CC      := /usr/bin/gcc
    LD      := /usr/bin/ld
    
    ASOPTS  := -gstabs
    CCOPTS  := -g
    CLIBS   := -lm
    
    all: armstrong4 armstrong5 armstrong6
    
    #harness: harness.s armstrong4macro.s power.s
    #armstrong: armstrong4main.s armstrong.s power.s
    
    armstrong4: armstrong4macro armstrong4string armstrong4divide 
    armstrong4macro: armstrong4main.s armstrong4macro.s power.s
    armstrong4string: armstrong4string.c
    armstrong4divide: armstrong4divide.c
    
    armstrong5: armstrong5macro armstrong5string armstrong5divide
    armstrong5macro: armstrong5main.s armstrong5macro.s power.s
    armstrong5divide: armstrong5divide.c
    armstrong5divide: armstrong5divide.c
    
    armstrong6: armstrong6macro armstrong6string armstrong6divide
    armstrong6macro: armstrong6main.s armstrong6macro.s power.s
    armstrong6string: armstrong6string.c
    armstrong6divide: armstrong6divide.c
    
    
    %: %.c
    	$(CC) $(CCOPTS) -o $@ $^ $(CLIBS)
    
    clean:
    	rm -f armstrong harness armstrong4macro armstrong4string armstrong4divide armstrong5macro armstrong5string armstrong5divide armstrong6macro armstrong6string armstrong6divide
    
    
    <script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"},"share":{}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script>
    阅读(1336) | 评论(0) | 转发(0) |
    给主人留下些什么吧!~~
    评论热议
  • 相关阅读:
    python常识系列17-->利用Faker模块造测试数据
    python常识系列16-->python自带的Mock模块使用
    python常识系列15-->python利用xlrd处理合并单元格
    python常识系列14-->python通过jpype模块调用jar包
    杂七杂八的问题处理01--mac下的eclipse默认不提供代码联想功能
    httprunner踩坑记03-->debugtalk.py中的方法传参
    httprunner踩坑记02-->利用parameters参数进行参数化
    vue新建项目一直在downloading template转,最后超时
    vue图片加载出错显示默认占位图片
    修改input复选框样式
  • 原文地址:https://www.cnblogs.com/ztguang/p/12648421.html
Copyright © 2011-2022 走看看