  • C++ | 程序编译连接原理

    源文件 a.c


    命令 :gcc -E a.cgcc -E a.c -o a.i 生成 a.i 文件

    1. 将所有的“#define”删除,并且展开所有宏;
    2. 处理掉所有条件预编译指令,如:“#if”、“#ifdef”、“#elif”、“#else”、“#endif”;
    3. 处理“#include”指令,这是一个递归过程;
    4. 删除所有的注释“//”和“/* */”;
    5. 添加行号和文件名标识;
    6. 保留所有的#pragma编译器指令,待编译器使用;

    命令 :gcc -S a.igcc -S a.i -o a.s 生成 a.s 文件


    命令 :gcc -c a.sgcc -c a.c -o a.o 生成 a.o 文件


    命令 :gcc -o a.o 其中 -l指定连接文件路径 -L指定头文件路径

    1. 合并段和符号表
    2. 符号解析
    3. 地址和空间分配
    4. 符号重定位

    a.o 文件也叫可重定位文件,虽然这个目标文件中包含了机器语言代码,但并不是一个完整的程序,由于缺少启动代码与库代码所以暂时不能运行因此我们在运行程序时还需要在进一步链接,通过链接器把启动代码,库代码,和目标代码结合在一起,并将它们放入单个文件,即可执行文件。



    /* main.cpp */
    extern int gdata;
    int sum(int, int);
    int data = 20;
    int main()
            int a = gdata;
            int b = data;
            int ret = sum(a,b);
            return 0;
    /* sum.cpp */
    int gdata = 10;
    int sum(int a, int b)
    {       return a+b;

    在Linux下通过 g++ -c main.cpp sum.cpp 命令,把两个文件编译为目标文件。

    在 .o 目标文件中存在若干段表,我们主要需要了解其中的 elf文件头、.text指令段、 .data/.bss 数据段、.symbal符号表段 … 等。


    使用 readelf -h main.o 查看elf文件头部信息。

    [stu@tr blog]$ readelf -h main.o
    ELF Header:
      Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
      Class:                             ELF64
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              REL (Relocatable file)
      Machine:                           Advanced Micro Devices X86-64
      Version:                           0x1
      Entry point address:               0x0
      Start of program headers:          0 (bytes into file)
      Start of section headers:          736 (bytes into file)
      Flags:                             0x0
      Size of this header:               64 (bytes)
      Size of program headers:           0 (bytes)
      Number of program headers:         0
      Size of section headers:           64 (bytes)
      Number of section headers:         12
      Section header string table index: 11
    [stu@tr blog]$ readelf -h sum.o
    ELF Header:
      Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
      Class:                             ELF64
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              REL (Relocatable file)
      Machine:                           Advanced Micro Devices X86-64
      Version:                           0x1
      Entry point address:               0x0
      Start of program headers:          0 (bytes into file)
      Start of section headers:          568 (bytes into file)
      Flags:                             0x0
      Size of this header:               64 (bytes)
      Size of program headers:           0 (bytes)
      Number of program headers:         0
      Size of section headers:           64 (bytes)
      Number of section headers:         11
      Section header string table index: 10

    Type: REL (Relocatable file)
    Entry point address: 0x0


    通过 objdump -t main.o 命令可以查看符号表。

    [stu@tr blog]$ objdump -t main.o
    main.o:     file format elf64-x86-64
    0000000000000000 l    df *ABS*  0000000000000000 main.cpp
    0000000000000000 l    d  .text  0000000000000000 .text
    0000000000000000 l    d  .data  0000000000000000 .data
    0000000000000000 l    d  .bss   0000000000000000 .bss
    0000000000000000 l    d  .note.GNU-stack        0000000000000000 .note.GNU-stack
    0000000000000000 l    d  .eh_frame      0000000000000000 .eh_frame
    0000000000000000 l    d  .comment       0000000000000000 .comment
    0000000000000000 g     O .data  0000000000000004 data
    0000000000000000 g     F .text  0000000000000033 main
    0000000000000000         *UND*  0000000000000000 gdata
    0000000000000000         *UND*  0000000000000000 _Z3sumii
    [stu@tr blog]$ objdump -t sum.o
    sum.o:     file format elf64-x86-64
    0000000000000000 l    df *ABS*  0000000000000000 sum.cpp
    0000000000000000 l    d  .text  0000000000000000 .text
    0000000000000000 l    d  .data  0000000000000000 .data
    0000000000000000 l    d  .bss   0000000000000000 .bss
    0000000000000000 l    d  .note.GNU-stack        0000000000000000 .note.GNU-stack
    0000000000000000 l    d  .eh_frame      0000000000000000 .eh_frame
    0000000000000000 l    d  .comment       0000000000000000 .comment
    0000000000000000 g     O .data  0000000000000004 gdata
    0000000000000000 g     F .text  0000000000000014 _Z3sumii

    我们可以看到main.o 和 sum.o 的符号表中*gdata_Z3sumii都是*UND*未定义,第二列的 l (local)表示只能在当前文件中可见,g(global)表示其他文件中可见。对于链接器来说只能看见 g 属性的符号。


    查看 .o 文件段表

    使用 objdump -s main.o 命令查看目标文件中常用的段表

    [stu@tr blog]$ objdump -s main.o
    main.o:     file format elf64-x86-64
    Contents of section .text:
     0000 554889e5 4883ec10 8b050000 00008945  UH..H..........E
     0010 fc8b0500 00000089 45f88b55 f88b45fc  ........E..U..E.
     0020 89d689c7 e8000000 008945f4 b8000000  ..........E.....
     0030 00c9c3                               ...
    Contents of section .data:
     0000 14000000                             ....
    Contents of section .comment:
     0000 00474343 3a202847 4e552920 342e382e  .GCC: (GNU) 4.8.
     0010 35203230 31353036 32332028 52656420  5 20150623 (Red
     0020 48617420 342e382e 352d3339 2900      Hat 4.8.5-39).
    Contents of section .eh_frame:
     0000 14000000 00000000 017a5200 01781001  .........zR..x..
     0010 1b0c0708 90010000 1c000000 1c000000  ................
     0020 00000000 33000000 00410e10 8602430d  ....3....A....C.
     0030 066e0c07 08000000                    .n......
    [stu@tr blog]$ objdump -s sum.o
    sum.o:     file format elf64-x86-64
    Contents of section .text:
     0000 554889e5 897dfc89 75f88b45 f88b55fc  UH...}..u..E..U.
     0010 01d05dc3                             ..].
    Contents of section .data:
     0000 0a000000                             ....
    Contents of section .comment:
     0000 00474343 3a202847 4e552920 342e382e  .GCC: (GNU) 4.8.
     0010 35203230 31353036 32332028 52656420  5 20150623 (Red
     0020 48617420 342e382e 352d3339 2900      Hat 4.8.5-39).
    Contents of section .eh_frame:
     0000 14000000 00000000 017a5200 01781001  .........zR..x..
     0010 1b0c0708 90010000 1c000000 1c000000  ................
     0020 00000000 14000000 00410e10 8602430d  .........A....C.
     0030 064f0c07 08000000                    .O......

    g++ -c main.cpp -g ; objdump -S main.o

    main.o:     file format elf64-x86-64
    Disassembly of section .text:
    0000000000000000 <main>:
    int sum(int, int);
    int data = 20;
    int main()
       0:   55                      push   %rbp
       1:   48 89 e5                mov    %rsp,%rbp
       4:   48 83 ec 10             sub    $0x10,%rsp
            int a = gdata;
       8:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # e <main+0xe>
       e:   89 45 fc                mov    %eax,-0x4(%rbp)
            int b = data;
      11:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 17 <main+0x17>
      17:   89 45 f8                mov    %eax,-0x8(%rbp)
            int ret = sum(a,b);
      1a:   8b 55 f8                mov    -0x8(%rbp),%edx
      1d:   8b 45 fc                mov    -0x4(%rbp),%eax
      20:   89 d6                   mov    %edx,%esi
      22:   89 c7                   mov    %eax,%edi
      24:   e8 00 00 00 00          callq  29 <main+0x29>
      29:   89 45 f4                mov    %eax,-0xc(%rbp)
            return 0;
      2c:   b8 00 00 00 00          mov    $0x0,%eax
      31:   c9                      leaveq
      32:   c3                      retq

    我们可以看到其中的两条指令,他们在执行时对 0x0 地址进行了相关操作,很明显这两句指令不可能执行成功,原因就在于该处指令中用到的符号地址不确定,暂时用 0x0 代替。而在链接后通过符号重定位这一步可以把这些暂时不确定的符号地址重新改写为确定的地址。这也是 obj 文件无法运行的原因之一。

    mov    0x0(%rip),%eax			// int a = gdata;
    mov    0x0(%rip),%eax			// int b = data;


    链接: g++ -o main main.o sum.o


    1. 合并段和符号表
    2. 符号解析 // 符号表
    3. 地址和空间分配
    4. 符号重定位 // 指令段

    使用readelf -h main 命令查看文件头部信息。

    [stu@tr blog]$ readelf -h main
    ELF Header:
      Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
      Class:                             ELF64
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              EXEC (Executable file)
      Machine:                           Advanced Micro Devices X86-64
      Version:                           0x1
      Entry point address:               0x400420
      Start of program headers:          64 (bytes into file)
      Start of section headers:          7208 (bytes into file)
      Flags:                             0x0
      Size of this header:               64 (bytes)
      Size of program headers:           56 (bytes)
      Number of program headers:         9
      Size of section headers:           64 (bytes)
      Number of section headers:         35
      Section header string table index: 34

    Type: REL (Relocatable file)
    Entry point address: 0x400420


    objdump -t main

    main:     file format elf64-x86-64
    0000000000400238 l    d  .interp        0000000000000000              .interp
    0000000000400254 l    d  .note.ABI-tag  0000000000000000              .note.ABI-tag
    0000000000400274 l    d  .note.gnu.build-id     0000000000000000              .note.gnu.build-id
    0000000000400298 l    d  .gnu.hash      0000000000000000              .gnu.hash
    00000000004002b8 l    d  .dynsym        0000000000000000              .dynsym
    0000000000400300 l    d  .dynstr        0000000000000000              .dynstr
    0000000000400360 l    d  .gnu.version   0000000000000000              .gnu.version
    0000000000400368 l    d  .gnu.version_r 0000000000000000              .gnu.version_r
    0000000000400388 l    d  .rela.dyn      0000000000000000              .rela.dyn
    00000000004003a0 l    d  .rela.plt      0000000000000000              .rela.plt
    00000000004003d0 l    d  .init  0000000000000000              .init
    00000000004003f0 l    d  .plt   0000000000000000              .plt
    0000000000400420 l    d  .text  0000000000000000              .text
    00000000004005d4 l    d  .fini  0000000000000000              .fini
    00000000004005e0 l    d  .rodata        0000000000000000              .rodata
    00000000004005f0 l    d  .eh_frame_hdr  0000000000000000              .eh_frame_hdr
    0000000000400630 l    d  .eh_frame      0000000000000000              .eh_frame
    0000000000600de0 l    d  .init_array    0000000000000000              .init_array
    0000000000600de8 l    d  .fini_array    0000000000000000              .fini_array
    0000000000600df0 l    d  .jcr   0000000000000000              .jcr
    0000000000600df8 l    d  .dynamic       0000000000000000              .dynamic
    0000000000600ff8 l    d  .got   0000000000000000              .got
    0000000000601000 l    d  .got.plt       0000000000000000              .got.plt
    0000000000601028 l    d  .data  0000000000000000              .data
    0000000000601034 l    d  .bss   0000000000000000              .bss
    0000000000000000 l    d  .comment       0000000000000000              .comment
    0000000000000000 l    d  .debug_aranges 0000000000000000              .debug_aranges
    0000000000000000 l    d  .debug_info    0000000000000000              .debug_info
    0000000000000000 l    d  .debug_abbrev  0000000000000000              .debug_abbrev
    0000000000000000 l    d  .debug_line    0000000000000000              .debug_line
    0000000000000000 l    d  .debug_str     0000000000000000              .debug_str
    0000000000000000 l    df *ABS*  0000000000000000              crtstuff.c
    0000000000600df0 l     O .jcr   0000000000000000              __JCR_LIST__
    0000000000400450 l     F .text  0000000000000000              deregister_tm_clones
    0000000000400480 l     F .text  0000000000000000              register_tm_clones
    00000000004004c0 l     F .text  0000000000000000              __do_global_dtors_aux
    0000000000601034 l     O .bss   0000000000000001              completed.6355
    0000000000600de8 l     O .fini_array    0000000000000000              __do_global_dtors_aux_fini_array_entry
    00000000004004e0 l     F .text  0000000000000000              frame_dummy
    0000000000600de0 l     O .init_array    0000000000000000              __frame_dummy_init_array_entry
    0000000000000000 l    df *ABS*  0000000000000000              main.cpp
    0000000000000000 l    df *ABS*  0000000000000000              sum.cpp
    0000000000000000 l    df *ABS*  0000000000000000              crtstuff.c
    0000000000400740 l     O .eh_frame      0000000000000000              __FRAME_END__
    0000000000600df0 l     O .jcr   0000000000000000              __JCR_END__
    0000000000000000 l    df *ABS*  0000000000000000
    00000000004005f0 l       .eh_frame_hdr  0000000000000000              __GNU_EH_FRAME_HDR
    0000000000601000 l     O .got.plt       0000000000000000              _GLOBAL_OFFSET_TABLE_
    0000000000600de8 l       .init_array    0000000000000000              __init_array_end
    0000000000600de0 l       .init_array    0000000000000000              __init_array_start
    0000000000600df8 l     O .dynamic       0000000000000000              _DYNAMIC
    0000000000601028  w      .data  0000000000000000              data_start
    00000000004005d0 g     F .text  0000000000000002              __libc_csu_fini
    0000000000400420 g     F .text  0000000000000000              _start
    0000000000000000  w      *UND*  0000000000000000              __gmon_start__
    00000000004005d4 g     F .fini  0000000000000000              _fini
    0000000000000000       F *UND*  0000000000000000              __libc_start_main@@GLIBC_2.2.5
    0000000000601030 g     O .data  0000000000000004              gdata
    00000000004005e0 g     O .rodata        0000000000000004              _IO_stdin_used
    000000000060102c g     O .data  0000000000000004              data
    0000000000601028 g       .data  0000000000000000              __data_start
    0000000000601038 g     O .data  0000000000000000              .hidden __TMC_END__
    00000000004005e8 g     O .rodata        0000000000000000              .hidden __dso_handle
    0000000000400560 g     F .text  0000000000000065              __libc_csu_init
    0000000000601034 g       .bss   0000000000000000              __bss_start
    0000000000400540 g     F .text  0000000000000014              _Z3sumii
    0000000000601038 g       .bss   0000000000000000              _end
    0000000000601034 g       .data  0000000000000000              _edata
    000000000040050d g     F .text  0000000000000033              main
    00000000004003d0 g     F .init  0000000000000000              _init



    objdump -S main

    main:     file format elf64-x86-64
    Disassembly of section .init:
    00000000004003d0 <_init>:
      4003d0:       48 83 ec 08             sub    $0x8,%rsp
      4003d4:       48 8b 05 1d 0c 20 00    mov    0x200c1d(%rip),%rax        # 600ff8 <__gmon_start__>
      4003db:       48 85 c0                test   %rax,%rax
      4003de:       74 05                   je     4003e5 <_init+0x15>
      4003e0:       e8 1b 00 00 00          callq  400400 <__gmon_start__@plt>
      4003e5:       48 83 c4 08             add    $0x8,%rsp
      4003e9:       c3                      retq
    .... 省略部分内容 .....
    000000000040050d <main>:
    int sum(int, int);
    int data = 20;
    int main()
      40050d:       55                      push   %rbp
      40050e:       48 89 e5                mov    %rsp,%rbp
      400511:       48 83 ec 10             sub    $0x10,%rsp
            int a = gdata;
      400515:       8b 05 15 0b 20 00       mov    0x200b15(%rip),%eax        # 601030 <gdata>
      40051b:       89 45 fc                mov    %eax,-0x4(%rbp)
            int b = data;
      40051e:       8b 05 08 0b 20 00       mov    0x200b08(%rip),%eax        # 60102c <data>
      400524:       89 45 f8                mov    %eax,-0x8(%rbp)
            int ret = sum(a,b);
      400527:       8b 55 f8                mov    -0x8(%rbp),%edx
      40052a:       8b 45 fc                mov    -0x4(%rbp),%eax
      40052d:       89 d6                   mov    %edx,%esi
      40052f:       89 c7                   mov    %eax,%edi
      400531:       e8 0a 00 00 00          callq  400540 <_Z3sumii>
      400536:       89 45 f4                mov    %eax,-0xc(%rbp)
            return 0;
      400539:       b8 00 00 00 00          mov    $0x0,%eax
      40053e:       c9                      leaveq
      40053f:       c3                      retq
    0000000000400540 <_Z3sumii>:
      400540:       55                      push   %rbp
      400541:       48 89 e5                mov    %rsp,%rbp
      400544:       89 7d fc                mov    %edi,-0x4(%rbp)
      400547:       89 75 f8                mov    %esi,-0x8(%rbp)
      40054a:       8b 45 f8                mov    -0x8(%rbp),%eax
      40054d:       8b 55 fc                mov    -0x4(%rbp),%edx
      400550:       01 d0                   add    %edx,%eax
      400552:       5d                      pop    %rbp
      400553:       c3                      retq
      400554:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
      40055b:       00 00 00
      40055e:       66 90                   xchg   %ax,%ax

    我们可以看到,在 .o 文件中没有确定的地址已经被重定向至正确的地址。

    mov    0x200b15(%rip),%eax			// int a = gdata;
    mov    0x200b08(%rip),%eax			// int b = data;

    readelf -l main 显示程序头表信息,包扩有几个段,每个段的属性,以及每个段中包含有哪几个节(Section)

    Elf file type is EXEC (Executable file)
    Entry point 0x400420
    There are 9 program headers, starting at offset 64
    Program Headers:
      Type           Offset             VirtAddr           PhysAddr
                     FileSiz            MemSiz              Flags  Align
      PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                     0x00000000000001f8 0x00000000000001f8  R E    8
      INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                     0x000000000000001c 0x000000000000001c  R      1
          [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
      LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                     0x0000000000000744 0x0000000000000744  R E    200000
      LOAD           0x0000000000000de0 0x0000000000600de0 0x0000000000600de0
                     0x0000000000000254 0x0000000000000258  RW     200000
      DYNAMIC        0x0000000000000df8 0x0000000000600df8 0x0000000000600df8
                     0x0000000000000200 0x0000000000000200  RW     8
      NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                     0x0000000000000044 0x0000000000000044  R      4
      GNU_EH_FRAME   0x00000000000005f0 0x00000000004005f0 0x00000000004005f0
                     0x000000000000003c 0x000000000000003c  R      4
      GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                     0x0000000000000000 0x0000000000000000  RW     10
      GNU_RELRO      0x0000000000000de0 0x0000000000600de0 0x0000000000600de0
                     0x0000000000000220 0x0000000000000220  R      1
     Section to Segment mapping:
      Segment Sections...
       01     .interp
       02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
       03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
       04     .dynamic
       05     .note.ABI-tag .note.gnu.build-id
       06     .eh_frame_hdr
       08     .init_array .fini_array .jcr .dynamic .got

    可以看到在程序的头部信息中,有两个 LOAD(加载器),在执行程序时分别对指令和数据进行加载。
    Entry point 0x400420 是程序执行的入口点,也就是程序开始执行的地方。





