zoukankan      html  css  js  c++  java
  • Linux ELF格式分析

    http://www.cnblogs.com/hzl6255/p/3312262.html

    ELF, Executable and Linking Format, 是一种用于可执行文件、目标文件、共享库和核心转储的标准文件格式。  ELF格式是是UNIX系统实验室作为ABI(Application Binary Interface)而开发和发布的。

    这里简单介绍一下相关历史:  
    - UNIX:        最初采用的格式为a.out,之后被System V中的COFF取代,最后则被SVR4中的ELF格式所取代。  
    - Windows:   采用的则是COFF格式的变种PE格式 
    - MAC OS X: 采用的是Mach-O格式

    ELF有四种不同的类型:  
    1. 可重定位文件(Relocatable): 编译器和汇编器产生的.o文件,需要被Linker进一步处理  
    2. 可执行文件(Executable): Have all relocation done and all symbol resolved except perhaps shared library symbols that must be resolved at run time  
    3. 共享对象文件(Shared Object): 即动态库文件(.so)  
    4. 核心转储文件(Core File): 

    1.ELF文件结构 

    可以从两个角度来描述ELF文件结构  
    ~1. Compilers,assemblers,linkers: 由Section header table描述的Sections组成  
    ~2. System loader: 由Program header table描述的Segments组成

    ELF_struct

    TIP:  
    - A single segment usually consist of several sections.  
    - Relocatable files have Section header tables. Executable files have Program header tables. Shared object files have both  
    - Sections are intended for further processing by a linker, while the segments are intended to be mapped into memory  
    - 只有ELF header是固定在文件的首部, 而Program header和Section header的位置则由ELF header指出

    ELF数据表示: 六种数据类型(32-bit)

    Name Size Alignment Purpose
    Elf32_Addr 4 4 Unsigned program address
    Elf32_Off 4 4 Unsigned file offset
    Elf32_Half 2 2 Unsigned medium interger
    Elf32_Word 4 4 unsigned interger
    Elf32_Sword 4 4 Signed interger
    unsigned char 1 1 Unsigned small interger

    @1: 

    ELF header: 在文件开始处,描述了整个文件的组织,占用 52-bytes

    #define EI_NIDENT (16)
    typedef struct
    {
      unsigned char e_ident[EI_NIDENT];   /* Magic number and other info */
      Elf32_Half    e_type;               /* Object file type */
      Elf32_Half    e_machine;            /* Architecture */
      Elf32_Word    e_version;            /* Object file version */
      Elf32_Addr    e_entry;              /* Entry point virtual address */
      Elf32_Off     e_phoff;              /* Program header table file offset */
      Elf32_Off     e_shoff;              /* Section header table file offset */
      Elf32_Word    e_flags;              /* Processor-specific flags */
      Elf32_Half    e_ehsize;             /* ELF header size in bytes */
      Elf32_Half    e_phentsize;          /* Program header table entry size */
      Elf32_Half    e_phnum;              /* Program header table entry count */
      Elf32_Half    e_shentsize;          /* Section header table entry size */
      Elf32_Half    e_shnum;              /* Section header table entry count */
      Elf32_Half    e_shstrndx;           /* Section header string table index */
    } Elf32_Ehdr;

    我们来看看一个最基本的ELF header

    [root@bogon ~]# readelf -h a.out 
    ELF Header:
      Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
      Class:                             ELF32
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              EXEC (Executable file)
      Machine:                           Intel 80386
      Version:                           0x1
      Entry point address:               0x80482a0                 /* e_entry */
      Start of program headers:          52 (bytes into file)      /* e_phoff */
      Start of section headers:          1992 (bytes into file)    /* e_shoff: See Starting address of section headers */
      Flags:                             0x0
      Size of this header:               52 (bytes)                /* e_ehsize */
      Size of program headers:           32 (bytes)                /* e_phentsize */
      Number of program headers:         8                         /* e_phnum */
      Size of section headers:           40 (bytes)                /* e_shentsize */
      Number of section headers:         29                        /* e_shnum */
      Section header string table index: 26                        /* e_shstrndx */

    从elf header我们可以得到如下信息?

    @2:

    section header:  包含section的信息。

    每个section header占 40-bytes (即e_shentsize大小)

    /* Section header.  */
    typedef struct
    {
      elf32_word    sh_name;        /* Section name (string tbl index) */
      elf32_word    sh_type;        /* Section type */
      elf32_word    sh_flags;       /* Section flags */
      elf32_addr    sh_addr;        /* Section virtual addr at execution */
      elf32_off     sh_offset;      /* Section file offset */
      elf32_word    sh_size;        /* Section size in bytes */
      elf32_word    sh_link;        /* Link to another section */
      elf32_word    sh_info;        /* Additional section information */
      elf32_word    sh_addralign;   /* Section alignment */
      elf32_word    sh_entsize;     /* Entry size if section holds table */
    } elf32_shdr;

    Section Type(*sh_type*) 

    PROGBITS:           This holds program contents including code, data, and debugger information. 
    NOBITS:             Like PROGBITS. However, it occupies no space. 
    SYMTAB and DYNSYM:  These hold symbol table.                              [See below]
    STRTAB:             This is a string table, like the one used in a.out.   [See below]
    REL and RELA:       These hold relocation information. 
    DYNAMIC and HASH:   This holds information related to dynamic linking. 

    下面列举了一些常见的Section:

    .text:  (PROGBITS:ALLOC+EXECINSTR)
         可执行代码
    .data:  (PROGBITS:ALLOC+WRITE)
         初始化数据
    .rodata:(PROGBITS:ALLOC)
         只读数据
    .bss:   (NOBITS:ALLOC+WRITE)
         未初始化数据,运行时会置0
    .rel.text, .rel.data, and .rel.rodata:(REL)
         静态链接的重定位信息
    .rel.plt: (REL)
         The list of elements in the PLT, which are liable to the relocatio during the dynamic linking(if PLT is used)
    .rel.dyn: (REL)
         The relocation for dynamically linked functions(if PLT is not used)     
    .symtab: 
    符号表 .strtab:
    字符串表 .shstrtab:
    Section String Table, 段名表 .init, .fini: (PROGBITS:ALLOC+EXECINSTR)
    程序初始化与终结代码段 .interp: (PROGBITS:ALLOC)
    This section holds the pathname of a program interpreter.For present,this is used to run the run-time dynamic linker to load the program and to link in any required shared libraries. .got, .plt: (PROGBIT)
    动态链接的跳转表和全局入口表.

    TIP: 符号表(symtab)和字符串表(strtab)的区别 
    strtab就是记录ELF文件中的字符串常量,变量名等等 
    symtab记录的则是函数和变量(符号), 主要用于链接时目标文件之间对地址的引用

    下面是基本的Section header tables [0x7c8 = 1992]

    [root@bogon ~]# readelf -s a.out 
    there are 29 section headers, starting at offset 0x7c8:
    section headers:
      [nr] name              type            addr     off    size   es flg lk inf al
      [ 0]                   null            00000000 000000 000000 00      0   0  0
      [ 1] .interp           progbits        08048134 000134 000013 00   a  0   0  1
      [ 2] .note.abi-tag     note            08048148 000148 000020 00   a  0   0  4
      [ 3] .hash             hash            08048168 000168 000024 04   a  4   0  4
      [ 4] .dynsym           dynsym          0804818c 00018c 000040 10   a  5   1  4
      [ 5] .dynstr           strtab          080481cc 0001cc 000045 00   a  0   0  1
      [ 6] .gnu.version      versym          08048212 000212 000008 02   a  4   0  2
      [ 7] .gnu.version_r    verneed         0804821c 00021c 000020 00   a  5   1  4
      [ 8] .rel.dyn          rel             0804823c 00023c 000008 08   a  4   0  4
      [ 9] .rel.plt          rel             08048244 000244 000010 08   a  4  11  4
      [10] .init             progbits        08048254 000254 000017 00  ax  0   0  4
      [11] .plt              progbits        0804826c 00026c 000030 04  ax  0   0  4
      [12] .text             progbits        080482a0 0002a0 000198 00  ax  0   0 16
      [13] .fini             progbits        08048438 000438 00001c 00  ax  0   0  4
      [14] .rodata           progbits        08048454 000454 00000c 00   a  0   0  4
      [15] .eh_frame_hdr     progbits        08048460 000460 00001c 00   a  0   0  4
      [16] .eh_frame         progbits        0804847c 00047c 000058 00   a  0   0  4
      [17] .ctors            progbits        080494d4 0004d4 000008 00  wa  0   0  4
      [18] .dtors            progbits        080494dc 0004dc 000008 00  wa  0   0  4
      [19] .jcr              progbits        080494e4 0004e4 000004 00  wa  0   0  4
      [20] .dynamic          dynamic         080494e8 0004e8 0000c8 08  wa  5   0  4
      [21] .got              progbits        080495b0 0005b0 000004 04  wa  0   0  4
      [22] .got.plt          progbits        080495b4 0005b4 000014 04  wa  0   0  4
      [23] .data             progbits        080495c8 0005c8 000004 00  wa  0   0  4
      [24] .bss              nobits          080495cc 0005cc 000008 00  wa  0   0  4
      [25] .comment          progbits        00000000 0005cc 000114 00      0   0  1
      [26] .shstrtab         strtab          00000000 0006e0 0000e5 00      0   0  1
      [27] .symtab           symtab          00000000 000c50 000440 10     28  49  4
      [28] .strtab           strtab          00000000 001090 000249 00      0   0  1
    key to flags:
      w (write), a (alloc), x (execute), m (merge), s (strings)
      i (info), l (link order), g (group), x (unknown)
      o (extra os processing required) o (os specific), p (processor specific)

    string table:

    这里的string是以null结尾的字符序列,用来表示Symbol和Section的名称,用索引来引用该字符串 
    对于Section string[.shstrtab] , ELF Header中的成员变量e_shstrndx则指明了所在Section, 
    索引则保存在每个Elf32_Shdr的sh_name中

    SeeMore

    symbol table: 

    定位和重定位程序的符号定义和引用

    SeeMore

    Relocation table:

    SeeMore 

    @3: 

    Program header: 指出怎样创建进程映像,含有每个program header的入口

    每个Program segment Header占 32-bytes(即e_phentsize大小)

    typedef struct
    {
      Elf32_Word    p_type;        /* Segment type */
      Elf32_Off     p_offset;      /* Segment file offset */
      Elf32_Addr    p_vaddr;       /* Segment virtual address */
      Elf32_Addr    p_paddr;       /* Segment physical address */
      Elf32_Word    p_filesz;      /* Segment size in file */
      Elf32_Word    p_memsz;       /* Segment size in memory */
      Elf32_Word    p_flags;       /* Segment flags */
      Elf32_Word    p_align;       /* Segment alignment */
    } Elf32_Phdr;

    Type of segment(*p_type*)

    PT_PHDR:    Specifies the location and size of the program header table itself, both in the file and in the memory image of the program.
    PT_LOAD:    This segment is a loadable segment.
    PT_DYNAMIC: This array element specifies dynamic linking information.
    PT_INTERP:  This element specified the location and size of a null-terminated path name to invoke as an interpreter.

    下面是Program header实例

    [root@bogon ~]# readelf -l a.out 
    Elf file type is EXEC (Executable file)
    Entry point 0x80482a0
    There are 8 program headers, starting at offset 52
    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
      PHDR           0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
      INTERP         0x000134 0x08048134 0x08048134 0x00013 0x00013 R   0x1
          [Requesting program interpreter: /lib/ld-linux.so.2]
      LOAD           0x000000 0x08048000 0x08048000 0x004d4 0x004d4 R E 0x1000
      LOAD           0x0004d4 0x080494d4 0x080494d4 0x000f8 0x00100 RW  0x1000
      DYNAMIC        0x0004e8 0x080494e8 0x080494e8 0x000c8 0x000c8 RW  0x4
      NOTE           0x000148 0x08048148 0x08048148 0x00020 0x00020 R   0x4
      GNU_EH_FRAME   0x000460 0x08048460 0x08048460 0x0001c 0x0001c R   0x4
      GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
     Section to Segment mapping:
      Segment Sections...
       00     
       01     .interp 
       02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
       03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
       04     .dynamic 
       05     .note.ABI-tag 
       06     .eh_frame_hdr 
       07
    

    @4:

    Section: 提供了目标文件的各项信息(如指令、数据、符号表、重定位信息等)

    2. ELF文件分析

    很多工具可以用来分析ELF文件

    除了上面的readelf外,还有objdump,objcopy等   

    # objdump -x /bin/ls                         # 查看ELF文件的section
    # objdump -j .data -s /bin/ls                # 显示指定section内容
    #
    # objcopy -O binary -j .text a.out text.bin  # 将.text section导入到text.bin文件中

    完整的分析教程:  <Linux C编程一站式学习-ELF文件>

    3. ELF文件解析

    很多地方有对ELF文件的解析 Linux对ELF文件的加载: 

    execve() –> sys_execve() –> do_execve() –> search_binary_handler() -elf-> load_elf_binary()/load_elf_library()

    binutils中readelf很形象的解析了ELF文件

    开源项目ELFToolChain

    atratus/coLinux/LINE: 其中的ELF Loader值得参考

    4. 参考文档

    RefSpes:   Linux Foundation Referenced Specifications

    SysV ABI:  System V ABI

    ELF规范:    Executable and Linking Format Specification V1.2

    ELF格式:    ELF Format

    PE格式:     PE Format

  • 相关阅读:
    操作系统之进程篇(3)
    指针和数组及内存管理
    进程篇(3: 基本进程控制:进程的退出)--请参照本博客“操作系统”专栏
    Java面向对象程序设计--泛型编程
    进程篇(1: 进程运行环境)--请参照本博客“操作系统”专栏
    操作系统之进程篇(1)
    分类器性能指标之ROC曲线、AUC值
    如何理解似然函数?
    sigmoid函数简介
    Hive分析窗口函数
  • 原文地址:https://www.cnblogs.com/feng9exe/p/6899351.html
Copyright © 2011-2022 走看看