Hiding variable and function names with static
C programmers use the static attribute to hide variable and function declarations inside modules, much as you would use public and private declarations in Java and C++.
Symbol tables are built by assemblers, using symbols exported by the compiler into the assembly-language .s file.
An ELF symbol table is contained in the .symtab section. It contains an array of entries. Figure 7.4 shows the format of each entry.
The name is a byte offset into the string table that points to the null-terminated string name of the symbol.
The value is the symbol’s address.
For relocatable modules, the value is an offset from the beginning of the section where the object is defined.
For executable object files, the value is an absolute run-time address.
The size is the size (in bytes) of the object.
The type is usually either data or function.
The binding field indicates whether the symbol is local or global.
Each symbol is associated with some section of the object file, denoted by the section field, which is an index into the section header table.
There are three special pseudo sections that don’t have entries in the section header table:
ABS is for symbols that should not be relocated.
UNDEF is for undefined sym- bols, that is, symbols that are referenced in this object module but defined else- where.
COMMON is for uninitialized data objects that are not yet allocated. For COMMON symbols, the value field gives the alignment requirement, and size gives the minimum size.
For example, here are the last three entries in the symbol table for main.o, as displayed by the GNU readelf tool.
The first eight entries, which are not shown, are local symbols that the linker uses internally.
In this example, we see an entry for the definition of global symbol buf, an 8- byte object located at an offset (i.e., value) of zero in the .data section.
This is followed by the definition of the global symbol main, a 17-byte function located at an offset of zero in the .text section.
The last entry comes from the reference for the external symbol swap.
Readelf identifies each section by an integer index. Ndx=1 denotes the .text section, and Ndx=3 denotes the .data section.
Symbol Resolution
The linker resolves symbol references by associating each reference with exactly one symbol definition from the symbol tables of its input relocatable object files.
Symbol resolution is straightforward for references to local symbols that are de- fined in the same module as the reference.
The compiler allows only one definition of each local symbol per module.
The compiler also ensures that static local vari- ables, which get local linker symbols, have unique names.
Symbol resolution procedure for global symbol:
When the com- piler encounters a symbol (either a variable or function name) that is not defined in the current module, it assumes that it is defined in some other module,
gener- ates a linker symbol table entry, and leaves it for the linker to handle. If the linker is unable to find a definition for the referenced symbol in any of its input modules,
it prints an (often cryptic) error message and terminates.
situation:
Symbol resolution for global symbols is also tricky because the same symbol might be defined by multiple object files. In this case, the linker must either flag an error or somehow choose one of the definitions and discard the rest.
The approach adopted by Unix systems involves cooperation between the compiler, assembler, and linker.
How Linkers Resolve Multiply Defined Global Symbols
At compile time, the compiler exports each global symbol to the assembler as either strong or weak, and the assembler encodes this information implicitly in the symbol table of the relocatable object file.
Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.
Unix linkers use the following rules for dealing with multiply defined symbols:
-
. Rule 1: Multiple strong symbols are not allowed.
-
. Rule 2: Given a strong symbol and multiple weak symbols, choose the strong
symbol.
-
. Rule 3: Given multiple weak symbols, choose any of the weak symbols.
Rule 1 :
In this case, the linker will generate an error message because the strong symbol main is defined multiple times (rule 1):
Similarly, the linker will generate an error message for the following modules because the strong symbol x is defined twice (rule 1):
Rule 2:
However, if x is uninitialized in one module, then the linker will quietly choose
the strong symbol defined in the other (rule 2):
Notice that the linker normally gives no indication that it has detected multiple definitions of x:
Rule 3:
The same thing can happen if there are two weak definitions of x (rule 3):
Rule 2 and Rule 3:
On an IA32/Linux machine, doubles are 8 bytes and ints are 4 bytes. Thus, the assignment x = -0.0 in line 6 of bar5.c will overwrite the memory locations for x and y
(lines 5 and 6 in foo5.c) with the double-precision floating-point representation of negative zero!
linux> gcc -o foobar5 foo5.c bar5.c linux> ./foobar5 x = 0x0 y = 0x80000000
When in doubt, invoke the linker with a flag such as the gcc -fno-common flag, which triggers an error if it encounters multiply defined global symbols.