zoukankan      html  css  js  c++  java
  • 程序的机器级表示(一)

    x86(wiki):

    x86 is a family of backward-compatible(向后兼容) instruction set architectures[a]based on the Intel 8086 CPU and its Intel 8088 variant(变种).

    IA-32(wiki)

    IA-32 (short for "Intel Architecture, 32-bit", sometimes also called i386[1][2])[3] is the 32-bit version of the x86instruction set architecture

    Program Encodings:

    unix> gcc -O1 -o p p1.c p2.c 

    The command-line option -O1 instructs the compiler to apply level-one optimizations. In general, increasing the level of optimization makes the final program run faster, but at a risk of increased compilation time and difficulties running debugging tools on the code.  

    In practice, level-two optimization (specified with the option -O2) is considered a better choice in terms of the resulting program performance. 

    流程:

    The gcc command actually invokes a sequence of programs to turn the source code into executable code.

    First, the C preprocessor expands the source code to include any files specified with #include commands and to expand any macros, specified with #define declarations.

    Second, the compiler generates assembly- code versions of the two source files having names p1.s and p2.s.

    Next, the assembler converts the assembly code into binary object-code files p1.o and p2.o. Object code is one form of machine code—it contains binary representations of all of the instructions, but the addresses of global values are not yet filled in.

    Finally, the linker merges these two object-code files along with code implementing library functions (e.g., printf) and generates the final executable code file p. Executable code is the second form of machine code we will consider—it is the exact form of code that is executed by the processor. The relation between these different forms of machine code and the linking process is described in more detail in Chapter 7. 

    ISA :

    The format and behavior of a machine-level program is de- fined by the instruction set architecture, or “ISA,” defining the processor state, the format of the instructions, and the effect each of these instructions will have on the state. 

    一些对程序员屏蔽但是重要的概念:

    • The program counter (commonly referred to as the “PC,” and called %eip in IA32) indicates the address in memory of the next instruction to be executed.

    •  The integer register file contains eight named locations storing 32-bit values. These registers can hold addresses (corresponding to C pointers) or integer data. Some registers are used to keep track of critical parts of the program state, while others are used to hold temporary data, such as the local variables of a procedure, and the value to be returned by a function.

    • The condition code registers hold status information about the most recently executed arithmetic or logical(算术) instruction. These are used to implement con- ditional changes in the control or data flow, such as is required to implement if and while statements.

    •  A set of floating-point registers store floating-point data

    下面描述了程序执行所需要的各个部分的情况,非常清晰精彩:

    The program memory contains the executable machine code for the program, some information required by the operating system, a run-time stack for managing procedure calls and returns, and blocks of memory allocated by the user (for example, by using the malloc library function). As mentioned earlier, the program memory is addressed using virtual addresses. At any given time, only limited subranges of virtual addresses are considered valid. For example, although the 32-bit addresses of IA32 potentially span a 4-gigabyte range of address values, a typical program will only have access to a few megabytes. The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.

    A single machine instruction performs only a very elementary operation. For example, it might add two numbers stored in registers, transfer data between memory and a register, or conditionally branch to a new instruction address. The compiler must generate sequences of such instructions to implement program constructs such as arithmetic expression evaluation, loops, or procedure calls and returns. 

    Code Examples 

    code.c file:

    int accum = 0;
    int sum(int x, int y)
    {
        int t=x+y; 
        accum += t; 
        return t;
    }

    To see the assembly code generated by the C compiler, we can use the “-S” option on the command line:

    unix> gcc -O1 -S code.c 

    If we use the ‘-c’ command-line option, gcc will both compile and assemble the code:

    unix> gcc -O1 -c code.c 

    This will generate an object-code file code.o that is in binary format and hence cannot be viewed directly

    disassembler :generate a format similar to assembly code from the machine code. 

    Linux下的反汇编指令:

    unix> objdump -d code.o

    典型输出以及注释:
    注意,Bytes标明的为汇编产生的object-code file 的内容
    实际生成真正的可执行代码需要linker:
    Generating the actual executable code requires running a linker on the set of object-code files, one of which must contain a function main. Suppose in file main.c we had the following function: 
    int main()
    {
      return sum(1, 3)
    }

    Then, we could generate an executable program prog as follows:

    unix> gcc -O1 -o prog code.o main.c 

    The file prog has grown to 9,123 bytes, since it contains not just the code for our two procedures but also information used to start and terminate the program as well as to interact with the operating system. We can also disassemble the file prog:

    unix> objdump -d prog
    The disassembler will extract various code sequences, including the following: 

    This code is almost identical to that generated by the disassembly of code.c. One important difference is that the addresses listed along the left are different—the linker has shifted the location of this code to a different range of addresses. A second difference is that the linker has determined the location for storing global variable accum. On line 6 of the disassembly for code.o, the address of accum was listed as 0. In the disassembly of prog, the address has been set to 0x804a018. This is shown in the assembly-code rendition of the instruction. It can also be seen in the last 4 bytes of the instruction, listed from least-significant to most as 18 a0 04 08. 

  • 相关阅读:
    java 服务端设置跨域
    Git 使用常用命令
    关于全局变量使用时编译问题
    从阿里云读取文档到后台
    kindeditor编辑器
    Java后台Excel表导出
    AVAudioPlayer
    网络第三节——NSURLSession
    KVC 和 KVO
    网络第二节——AFNworking
  • 原文地址:https://www.cnblogs.com/geeklove01/p/9125519.html
Copyright © 2011-2022 走看看