Checking the Performance of FindArray

zoukankan html css js c++ java

Checking the Performance of FindArray
FindArray Example

Let’s create a program that shows how a sample C++ compiler generates code for a function

named FindArray. Later, we will write an assembly language version of the function, attempting

to write more efficient code than the C++ compiler. The following FindArray function (in C++)

searches for a single value in an array of long integers:
bool FindArray( long searchVal, long array[], long count ) { for(int i = 0; i < count; i++) { if( array[i] == searchVal ) return true; } return false; }
Linking MASM to Visual C++

Let’s create a hand-optimized assembly language version of FindArray, named AsmFindArray.

A few basic principles are applied to the code optimization:

• Move as much processing out of the loop as possible.

• Move stack parameters and local variables to registers.

• Take advantage of specialized string/array processing instructions (in this case, SCASD).

We will use Microsoft Visual C++ (Visual Studio) to compile the calling C++ program and

Microsoft MASM to assemble the called procedure. Visual C++ generates 32-bit applications that

run only in protected mode. We choose Win32 Console as the target application type for the examples

shown here, although there is no reason why the same procedures would not work in ordinary

MS-Windows applications. In Visual C++, functions return 8-bit values in AL, 16-bit values in AX,

32-bit values in EAX, and 64-bit values in EDX:EAX. Larger data structures (structure values,

arrays, etc.) are stored in a static data location, and a pointer to the data is returned in EAX.

Our assembly language code is slightly more readable than the code generated by the C++

compiler because we can use meaningful label names and define constants that simplify the use

of stack parameters. Here is the complete module listing:
TITLE AsmFindArray Procedure (AsmFindArray.asm) .586 .model flat,C AsmFindArray PROTO, srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD .code ;----------------------------------------------- AsmFindArray PROC USES edi, srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD ; ; Performs a linear search for a 32-bit integer ; in an array of integers. Returns a boolean ; value in AL indicating if the integer was found. ;----------------------------------------------- true = 1 false = 0 mov eax,srchVal ; search value mov ecx,count ; number of items mov edi,arrayPtr ; pointer to array repne scasd ; do the search jz returnTrue ; ZF = 1 if found returnFalse: mov al,false jmp short exit returnTrue: mov al, true exit: ret AsmFindArray ENDP END
Checking the Performance of FindArray

Test Program It is interesting to check the performance of any assembly language code

you write against similar code written in C++. To that end, the following C++ test program

inputs a search value and gets the system time before and after executing a loop that calls

FindArray one million times. The same test is performed on AsmFindArray. Here is a listing

of the findarr.h header file, with function prototypes for the assembly language procedure and

the C++ function:
// findarr.h extern "C" { bool AsmFindArray( long n, long array[], long count ); // Assembly language version bool FindArray( long n, long array[], long count ); // C++ version }
Main C++ Module Here is a listing of main.cpp, the startup program that calls FindArray and

AsmFindArray:
// main.cpp - Testing FindArray and AsmFindArray. #include <iostream> #include <time.h> #include "findarr.h" using namespace std; int main() { // Fill an array with pseudorandom integers. const unsigned ARRAY_SIZE = 10000; const unsigned LOOP_SIZE = 1000000; long array[ARRAY_SIZE]; for(unsigned i = 0; i < ARRAY_SIZE; i++) array[i] = rand(); long searchVal; time_t startTime, endTime; cout << "Enter value to find: "; cin >> searchVal; cout << "Please wait. This will take between 10 and 30 seconds... "; // Test the C++ function: time( &startTime ); bool found = false; for( int n = 0; n < LOOP_SIZE; n++) found = FindArray( searchVal, array, ARRAY_SIZE ); time( &endTime ); cout << "Elapsed CPP time: " << long(endTime - startTime) << " seconds. Found = " << found << endl; // Test the Assembly language procedure: time( &startTime ); found = false; for( int n = 0; n < LOOP_SIZE; n++) found = AsmFindArray( searchVal, array, ARRAY_SIZE ); time( &endTime ); cout << "Elapsed ASM time: " << long(endTime - startTime) << " seconds. Found = " << found << endl; return 0; }
Assembly Code versus Nonoptimized C++ Code We compiled the C++ program to a

Release (non-debug) target with code optimization turned off. Here is the output, showing the

worst case (value not found):

Assembly Code versus Compiler Optimization Next, we set the compiler to optimize the

executable program for speed and ran the test program again. Here are the results, showing the

assembly code is noticeably faster than the compiler-optimized C++ code:

Pointers versus Subscripts

Programmers using older C compilers observed that processing arrays with pointers was more efficient

than using subscripts. For example, the following version of FindArray uses this approach:
bool FindArray( long searchVal, long array[], long count ) { long * p = array; for(int i = 0; i < count; i++, p++) if( searchVal == *p ) return true; return false; }
Running this version of FindArray through the Visual C++ compiler produced virtually the

same assembly language code as the earlier version using subscripts. Because modern compilers

are good at code optimization, using a pointer variable is no more efficient than using a subscript.

Here is the loop from the FindArray target code that was produced by the C++ compiler:
$L176: cmp esi, DWORD PTR [ecx] je SHORT $L184 inc eax add ecx, 4 cmp eax, edx jl SHORT $L176
Your time would be well spent studying the output produced by a C++ compiler to learn about

optimization techniques, parameter passing, and object code implementation. In fact, many computer

science students take a compiler-writing course that includes such topics. It is also important to

realize that compilers take the general case because they usually have no specific knowledge about

individual applications or installed hardware. Some compilers provide specialized optimization for a

particular processor such as the Pentium, which can significantly improve the speed of compiled

programs. Hand-coded assembly language can take advantage of string primitive instructions, as

well as specialized hardware features of video cards, sound cards, and data acquisition boards.
查看全文

相关阅读:
【Matlab】去除图片周围空白区域(plot subplot)
使用nbdev进行jupyter项目的开发
 如何绘制符合打印标准的图形
 如何使用Python完成视频的快速剪辑
 如何查看和修改论文图片的打印尺寸
 使用TMUX替代screen工具
 Emacs设置包管理器以及镜像
 Emacs的配置文件
 Emacs Windows的设置
 数据科学新的工具Julia

原文地址：https://www.cnblogs.com/dreamafar/p/5995125.html