ELF Dynamic Resolution Symbol Process (Revised)

xiaoxiao2021-03-06  86

ELF Dynamic Resolution Symbol Process (Revised)

By Alert7 2002-01-27 Reprinted from: http://elfhack.whitecell.org

★★ Preface

This article uses Linux as a platform as an example, demonstrates the process of dynamically parsing the symbols of the ELF. If you don't have anything, please also ask the ax.

Typically, ELF parsing symbols are called lazy mode loading. This load technology is the default approach on the ELF platform. This mechanism is also different in different architectural platforms. However, I386 and SPARC are the same at most.

Dynamic Connector (RTLD) provides dynamic connection of symbols, loads a reference to shared Objects and resolution labels. Usually ld.so, it can be a shared Object or a executable file.

★★ Symbol Table

Each Object To make it available to other ELF files, you must use Symbol entry in symbol table. In fact, a symbol entry is a Symbol structure, which describes the name of this Symbol and the Symbol Value. . Symbol Name is encoded as an index of Dynamic String Table (INDEX). The value of a symbol is the address of the Symbol within the ELF Object file. This address typically needs to be repositioned (coupled with Base Load Address). This constitutes an absolute address in memory in memory. A symbol table entry has the following format: typedef struct {Elf32_Word st_name; / * Symbol name (string tbl index) * / Elf32_Addr st_value; / * Symbol value * / Elf32_Word st_size; / * Symbol size * / unsigned char st_info; / * Symbol type and binding * / unsigned char st_other; / * no defined meaning, 0 * / ELF32_SECTION ST_SHNDX; / * Section index * /} ELF32_SYM;

Executable files They know their address at the time, so their internal reference symbols have been relocated when compiling.

★★ GOT (Global Offset Table)

GOT is an array that exists in the ELF Image data segment. They are some pointers that point to Objects (usually data objects). The dynamic connector will re-modify the GOT entry that has not been determined when compiled. So GOT plays an important role in the i386 dynamic connection.

★★ PLT (Procedure Linkage Table)

The PLT is a structure that entries include some of the code snippets to transmit control to the outside. Under the i386 system, PLT and his code snippet entries have the following format:

PLT0: PUSH GOT [1]; Word of Identifying Information JMP Got [2]; Pointer to RTLD Function NOP ... PLTN: JMP GOT [X N]; Got Offset Of Symbol Address Push N; Relocation Offset of Symbol JMP PLT0 ; Call the RTLDLTN 1 JMP Got [x N 1]; Got Offset of Symbol Address Push N 1; Relocation Offset Of Symbol JMP PLT0; Call The RTLD When the transfer is controlled to an external function, it is transmitted to The entry associated with this symbol in the PLT is installed at compiler. The first instruction in PLT Entry will use a pointer address to a GOT; if the symbol has not been parsed, the GOT is in place is the next command address in the PLT Entry. This command PUSH is an offset in the relocation table to STACK, and then the next instruction transmission is controlled to the PLT [0] entry. This PLT [0] contains a function code that calls the RTLD parsing symbol. This parsing symbol function address is inserted by the program loader in Got [2].

The dynamic connector will expand the Stack and get the resolution symbol in relocation table address information. Relocation entry, symbolic table and string table together determine the symbols of PLT Entry references and address that should be stored in the process memory. If possible, the symbol will be parsed, and its address will be stored in the got entry used by the PLT Entry. When the symbol is requested next time, the corresponding GOT already contains the address of the symbol. Therefore, all the later calls will be directly controlled by GOT transmission. The dynamic connector only parses the symbols referenced by the binary file; this reference method is what we said above is Lazy Mode.

★★ Hash table and chain (haveh Table and chain)

In addition to the symbol table, a Symbol Table, a string table, the String Table, ELF Objects can also include a Hash Table and Chain (used to make the dynamic connector parsing the symbol easier ). Hash Table and Chain are usually used to quickly determine which entry in the symbol table may meet the requested symbol name. Hash Table (Always accompanied by chain) is stored as an integer array. In the Hash table, half of the position is left to Buckets, the other half is the element left in Chain. Hash table directly reflects the number of EMBOL TABLE and their order.

The dynamic connector structure provides all dynamic connections to access the dynamic connector in a transparent manner. However, explicit access is also available. Dynamic connection (loading shared objects and analytical symbols) can be done by directly accessing those functions of RTLD: DLOPEN (), DLSYM () anddlclose (). These functions are included in the dynamic connector itself. In order to access those functions, you need to connect the dynamic connection function library (libdl) when connecting. The library contains some STUB functions to allow the connector to parse references to those functions when compiling; however, those STUB functions are only simply returned to 0. Because the fact that the function resides in the dynamic connector, if those functions are called from the static connection ELF file, the load will fail.

For executive dynamic connectors, the number of Hash Table, Hash Table elements, CHAIN, DYNAMIC STRING TABLE, and Dynamic Symbol Talbe. These conditions are met, the following algorithm applies any Symbol's address calculation: 1. HN = Elf_hash (SYM_NAME)% nbuckets; 2. For (ndx = hash [hn]; ndx; ndx = chain [ndx]) {3. Symbol = SYM_TAB NDX; 4. IF (STRCMP (SYM_NAME, STR_TAB SYMBOL-> ST_NAME) == 0) 5. Return (load_addr symbol-> st_value);}

The HASH number is the return value of ELF_HASH (), and is defined in Part 4 of the ELF specification to take advantage of the elements of the element in the Hash Table. This number is used to do the following table index of Hash Table, find the HASH value, find the index of the Chain of the matching symbol name (line 3). With this index, the symbols are obtained from the symbol table (Line 3). The comparison symbol name and the requesting symbol name are the same (LINE 5). Using this algorithm, you can simply parse any symbols.

★★ Demo

#include int main (int Argc, char * argv []) {Printf ("Hello, World / N"); Return 0;}

Relocation section '.rel.plt' at offset 0x278 contains 4 entries: Offset Info Type Symbol's Value Symbol's Name 0804947c 00107 R_386_JUMP_SLOT 080482d8 __register_frame_info 08049480 00207 R_386_JUMP_SLOT 080482e8 __deregister_frame_info 08049484 00307 R_386_JUMP_SLOT 080482f8 __libc_start_main 08049488 00407 R_386_JUMP_SLOT 08048308 printf only appear in R_386_JUMP_SLOT of GOT in

Symbol table '.dynsym' contains 7 entries: Num: Value Size Type Bind Ot Ndx Name 0: 0 0 NOTYPE LOCAL 0 UND 1: 80482d8 116 FUNC WEAK 0 UND __register_frame_info@GLIBC_2.0 (2) 2: 80482e8 162 FUNC WEAK 0 UND __deregister_frame_info@GLIBC_2.0 (2) 3: 80482f8 261 FUNC GLOBAL 0 UND __libc_start_main@GLIBC_2.0 (2) 4: 8048308 41 FUNC GLOBAL 0 UND printf@GLIBC_2.0 (2) 5: 804843c 4 OBJECT GLOBAL 0 14 _IO_stdin_used 6: 0 0 Notype Weak 0 Und __gmon_start __ [alert7 @ redhat] $ gcc -o test test.c [alert7 @ redhat] $ ./testhello, world [alert7 @

Redhat] $ OBJDUMP -X TEST ... DYNAMIC Section:

NEEDED libc.so.6 INIT 0x8048298 FINI 0x804841c HASH 0x8048128 STRTAB 0x80481c8 SYMTAB 0x8048158 STRSZ 0x70 SYMENT 0x10 DEBUG 0x0 PLTGOT 0x8049470 PLTRELSZ 0x20 PLTREL 0x11 JMPREL 0x8048278 REL 0x8048270 RELSZ 0x8 RELENT 0x8 VERNEED 0x8048250 VERNEEDNUM 0x1 VERSYM 0x8048242.

.. 7 .rel.got 00000008 08048270 08048270 00000270 2 ** 2 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .rel.plt 00000020 08048278 08048278 00000278 2 ** 2 CONTENTS, ALLOC, LOAD, READONLY, DATA 9 .init 0000002f 08048298 08048298 00000298 2 ** 2 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .plt 00000050 080482c8 080482c8 000002c8 2 ** 2 CONTENTS, ALLOC, LOAD, READONLY, CODE 11 .text 000000fc 08048320 08048320 00000320 2 ** 4 CONTENTS, ALLOC , LOAD, READONLY, CODE 12 .fini 0000001a 0804841c 0804841c 0000041c 2 ** 2 CONTENTS, ALLOC, LOAD, READONLY, CODE 13 .rodata 00000016 08048438 08048438 00000438 2 ** 2 CONTENTS, ALLOC, LOAD, READONLY, DATA 14 .data 0000000c 08049450 08049450 00000450 2 ** 2 Contents, Alloc, Load, Data 15.e_frame 00000004 0804945C 0804945C 0000045C 2 ** 2 CONTENTS, ALLOC, LOAD, DATA 16 .ctors 00000008 08049460 08049460 00000460 2 ** 2 CONTENTS, ALLOC, LOAD, DATA 17 .dtors 00000008 08049468 08049468 00000468 2 ** 2 CONTENTS, ALLOC, LOAD, DATA 18 .got 00000020 08049470 08049470 00000470 2 ** 2 Contents, Alloc, Load, Data 19 .dynamic 000000a0 08049490 08049490 00000490 08049490 00000490 2 ** 2 Contents, Alloc, Load, Data ... [Alert7 @ redhat] $ GDB -Q Test (GDB) Disass MageMP of Assembler Code For function main: 0x80483d0

: push% EBP0X80483D1
: MOV% ESP,% EBP0X80483D3
: push $ 0x80484400x80483d8

: Call 0x8048308 0x80483dd

: add $ 0x4,% ESP0X80483E0
: xor% Eax,% EAX0X80483E2
: jmp 0x80483e4
0x80483e4
: leave0x80483e5
: ret ... 0x80483ef
: nopEnd of assembler dump (gdb) b * 0x80483d8Breakpoint 1 at 0x80483d8 (gdb) rStarting program:. / home / alert7 / testBreakpoint 1, 0x80483d8 in main () (GDB) Disass 0x8048308 1 (1) DUMP of Assembler code for function printf: / ******************************************* ******* / / / PLT4: 0X8048308 : jmp * 0x8049488 // JMP got [6] // At this time, 0x804830e0x804830e : Push $ 0x18 // $ 0x18 Offset of the PrintF Relocation in Jmprel Section 0x8048313 : JMP 0x80482c8 <_init 48> // jmp plt0 // PLT0 Disabled Instructions to call the RTLD function // When the function When returning, modify GOT [6] to the true // Printf function address, then jump directly to the Printf function // execution. This part is part of the PLT / ************************************************ / END of assembler dump (gdb) x 0x8049488 0x8049488 <_GLOBAL_OFFSET_TABLE_ 24>:. 0x0804830e080482c8 <.plt>: ② // PLT0: 80482c8: ff 35 74 94 04 08 pushl 0x8049474 // pushl GOT [1] address // GOT [1] Is an authentication information, a pointer to the link_map type

80482 CE: FF 25 78 94 04 08 JMP * 0x8049478 // JMP GOT [2] // Skip to the Dynamic Connector Resolution Function Perform 80482D4: 00 00 Add% Al, (% EAX) 80482D6: 00 00 Add% Al, (% EAX)

80482d8: FF 25 7C 94 04 08 JMP * 0x804947C // PLT1: 80482de: 68 00 00 00 Push $ 0x0 80482e3: E9 E0 FF FF JMP 80482C8 <_init 0x30>

80482E8: FF 25 80 94 04 08 JMP * 0x8049480 // PLT2: 80482EE: 68 08 00 00 00 Push $ 0x8 80482f3: E9 D0 FF FF JMP 80482C8 <_init 0x30> 80482f8: FF 25 84 94 04 08 JMP * 0x8049484 // plt3: 80482fe: 68 10 00 00 00 Push $ 0x10 8048303: E9 C0 FF FF JMP 80482C8 <_init 0x30>

8048308: FF 25 88 94 04 08 JMP * 0x8049488 // PLT4: 804830E: 68 18 00 00 00 Push $ 0x18 8048313: E9 B0 FF FF JMP 80482C8 <_init 0x30>

(GDB) B * 0x80482C8BreakPoint 2 AT 0x80482c8 (GDB) ccontinuing.

Breakpoint 2, 0x80482c8 in _init () (gdb) x / 8x 0x80494700x8049470 <_GLOBAL_OFFSET_TABLE_>: 0x08049490 0x40013ed0 0x4000a960 0x400fa5500x8049480 <_GLOBAL_OFFSET_TABLE_ 16>: 0x080482ee 0x400328cc 0x0804830e 0x00000000 (gdb) x / 50x 0x40013ed0 (* link_map type) 0x40013ed0: 0x00000000 0x40010c27 0x08049490 0x400143e00x40013ee0: 0x00000000 0x40014100 0x00000000 0x080494900x40013ef0: 0x080494e0 0x080494d8 0x080494a8 0x080494b00x40013f00: 0x080494b8 0x00000000 0x00000000 0x000000000x40013f10: 0x080494c0 0x080494c8 0x08049498 0x080494a00x40013f20: 0x00000000 0x00000000 0x00000000 0x080494f80x40013f30: 0x08049500 0x08049508 0x080494e8 0x080494d00x40013f40: 0x00000000 0x080494f0 0x00000000 0x000000000x40013f50: 0x00000000 0x00000000 0x00000000 0x000000000x40013f60: 0x00000000 0x00000000 0x00000000 0x00000000 (gdb) disass 0x4000a960 ③Dump of assembler code for function _dl_runtime_resolve: 0x4000a960 <_dl_runtime_resolve>: push% eax0x4000a961 <_dl_runtime_resolve 1>: push% ecx0x4000a962 <_dl_runtime_resolve 2>: push% edx0x4000a963 <_dl_runtime_resolve 3>: mov 0x10 ( % ESP, 1),% EDX0X4000A967 <_DL_Runtime_Resolve 7>: MOV 0xc (% ESP, 1),% EAX0X4000A96B <_dl_runtime_resolve 11>: Call 0x4000A740 // Call the true resolution function fixup (), correct got [ 6] Make it point to the real printf function address 0x4000A970 <_dl_runtime_resolve 16>: POP% EDX0X4000A971 <_dl_runtime_resolve 17>: POP% ECX0X4000A972 <

_dl_runtime_resolve 18>: xchg% eax, (% esp, 1) 0x4000a975 <_dl_runtime_resolve 21>: ret $ 0x8 // printf function address jump performed 0x4000a978 <_dl_runtime_resolve 24>: nop0x4000a979 <_dl_runtime_resolve 25>: lea 0x0 ( % esi, 1),% esiEnd of assembler dump (gdb) b * 0x4000a972Breakpoint 4 at 0x4000a972:. file dl-runtime.c, line 182. (gdb) cContinuing.Breakpoint 4, 0x4000a972 in _dl_runtime_resolve () at dl-runtime. c: 182182 in dl-runtime.c (gdb) i reg $ eax $ espeax 0x4006804c 1074167884esp 0xbffffb64 -1073743004 (gdb) b * 0x4000a975Breakpoint 5 at 0x4000a975: file dl-runtime.c, line 182. (gdb) cContinuing.

Breakpoint 5, 0x4000A975 in _dl_runtime_resolve () at dl-runtime.c: 182182 in dl-runtime.c (gdb) siprintf (format = 0x1

) at printf.c: 2626 Printf.c: No Such file or directory (gdb) disass ④ ⑵Dump of assembler code for function printf:. 0x4006804c : push% ebp0x4006804d : mov% esp,% ebp0x4006804f : push% ebx0x40068050 : Call 0x40068055 0x40068055 : POP% EBX0X40068056 : add $ 0xA2197,% EBX0X4006805C : Lea 0xc (% EBP),% EAX0X4006805F : push% EAX0X40068060 : pushl 0x8 (% EBP) 0x40068063 : MOV 0x81c (% EBX),% EAX0X40068069 : pushl (% EAX) 0x4006806B : call 0x400325b40x40068070 : mov 0xfffffffc (% ebp),% ebx0x40068073 : leave0x40068074 : retEnd of assembler dump (gdb) x / 8x 0x80494700x8049470 <_GLOBAL_OFFSET_TABLE_>:. 0x08049490 0x40013ed0 0x4000a960 0x400F A5500X8049480 <_global_offset_table_ 16>: 0x080482ee 0x400328cc 0x4006804c 0x00000000got [6] has been corrected to 0x4006804c

When you call Printf () for the first time, you need to pass 1-> 2-> 3-> 4 to call Printf (), you don't need to be so complicated. As long as you pass (1 )-> (2)

Let's see how to fix GOT [6], but also say how to find the address to be corrected (I used to have some big misunderstandings in this understanding, misleading the place where you mislead :)))

1: Push $ 0x18 when entering the PLT4, the $ 0x18 is the offset of the Printf relocation in Jmprel Section 2: PrintF relocation address is JMPrel $ 0x18 / * ELF32_rel * reloc = jmprel reloc_offset; * / (GDB ) x / 8x 0x8048278 0x180x8048290: 0x08049488 0x00000407 0x53e58955 0x000000e80x80482a0 <_init 8>: 0xc3815b00 0x000011cf 0x001cbb83 0x74000000typedef struct {Elf32_Addr r_offset; Elf32_Word r_info;} Elf32_Rel; printf that is relocatable printf_retloc.r_offset = 0x08049488; printf_retloc.r_info = 0x00000407 Next, look at 0x08049488 (GDB) x 0x080494880x8049488 <_global_offset_table_ 24>: 0x4006804c is also got [6] 3: void * const reel_addr = (void *) (l-> l_addr reeloc-> r_offset); A executable file or a shared goal, Rel_addr is equal to RELOC-> R_OFFSET So Rel_addr = 0x08049488 = got [6]; 4: * Reloc_addr = value; fix Rel_ADDR is GOT [6] as value as value is calculated Refer to the following source code while r_info yet associated with a symbol Elf32_Sym * sym = & SYMTAB [ELF32_R_SYM (reloc-> r_info)]; sym = 0x8048158 0x00000407; typedef struct {Elf32_Word st_name; Elf32_Addr st_value; Elf32_Word st_size; unsigned char st_info ; unsigned char st_other; Elf32_Half st_shndx;} Elf32_Sym; (gdb) x / 10x 0x8048158 0x000004070x804855f: 0x00003a00 0x00008000 0x00000000 0x000069000x804856f: 0x00008000 0x00000000 0x00008300 0x000080000x804857f: 0x00000000 0x0000b700link_map structure as follows: / * Structure describing a loaded shared object The `l_next 'and. `l_prev 'Members Form a chain of all the shared objects loaded at Startup.

THESE DATA STRUCTURES EXIST IN SPACE Used by The Run-Time Dynamic Linker; Modifying The MAY HAVE DISASTROSTROUS RESULTS.

THIS DATA STRUCTURE Might Change In Future, IF Necessary. User-Level Programs Must Avoid Defining Objects of this Type. * / ★★ Glibc dynamic resolution symbol source code (GLIBC 2.1.3 implementation)

.text .globl _dl_runtime_resolve .type _dl_runtime_resolve, @function .align 16_dl_runtime_resolve:.. pushl% eax # Preserve registers otherwise clobbered pushl% ecx pushl% edx movl 16 (% esp),% edx # Copy args pushed by PLT in register Note movl 12 (% ESP),% EAX # That `fixup 'Takes ITS Parameters in Regs. Call fixup # call resolver. Popl% Edx # Get Register Content Back. Popl% ECX XCHGL% EAX, (% ESP) # get% EAX Contents End Store Function Address. Ret $ 8 # jump to function address.

static ElfW (Addr) __attribute__ ((unused)) fixup (# ifdef ELF_MACHINE_RUNTIME_FIXUP_ARGS ELF_MACHINE_RUNTIME_FIXUP_ARGS, # endif struct link_map * l, ElfW (Word) reloc_offset) {const ElfW (Sym) * const symtab = (const void *) l-> l_info [Dt_symtab] -> d_un.d_ptr; const char * stratab = (const void *) L-> L_INFO [DT_STRTAB] -> D_un.d_ptr;

Const pltrel * const reel / * Calculation Function Locate Population * / = (Const Void *) (L-> L_INFO [DT_JMPREL] -> D_UN.D_PTR RELOC_OFFSET); / * L-> L_INFO [DT_JMPREL] -> D_UN. D_ptr is the address of JMPREL Section * /

Const Elfw (SYM) * SYM = & SYMTAB [ELFW (R_SYM) (R_SYM) (R_SYM)]; / * Calculation Function SYMTAB Entrance * / VOID * Const REL_ADDR = (Void *) (L-> L_ADDR Reloc-> r_offset) ; / * Redirect the absolute address * / ELFW (addr) Value;

/ * The use of `alloca 'here looks ridiculous but it helps. The goal is to prevent the function from being inlined and thus optimized out. There is no official way to do this so we use this trick. Gcc never inlines functions which use `alloca '. * / alloca (sizeof (int)); / * Sanity Check That We're real loook at a plt relocation. * / assert (ELFW (R_TYPE) (R_TYPE) (R_TYPE) == Elf_machine_jmp_slot); / * Quality check * /

/ * Look up the target symbol. * / Switch (l_info [VersymidX (DT_VERSYM)]! = Null) {default: {const Elfw (HALF) * VERNUM = (const void *) l-> l_info [Versymidx ( DT_VERSYM)] -> D_un.d_ptr; ELFW (HALF) NDX = VERNUM [ELFW (R_SYM) (R_SYM)]; const struct r_found_version * version = & l-> l_versions [ndx];

if (! version-> hash = 0) {value = _dl_lookup_versioned_symbol (strtab sym-> st_name, & sym, l-> l_scope, l-> l_name, version, ELF_MACHINE_JMP_SLOT); break;}} case 0: value = _dl_lookup_symbol ( strtab sym-> st_name, & sym, l-> l_scope, l-> l_name, ELF_MACHINE_JMP_SLOT);} / * At this point the base address value is loaded object * / / * Currently value contains the base load address of the object that defines Sym. Now add in the symbol offset. * /

Value = (SYM? Value SYM-> ST_VALUE: 0); / * The absolute address of the function * /

/ * And now perhaps the relocation addend. * / Value = ELF_MACHINE_PLT_VALUE (L, Reloc, Value); / * Maybe you need to reposition * /

/ * Finally, fix up the plt itself. * / Elf_machine_fixup_plt (l, reloc, rel_addr, value); / * Fix REL_ADDR, generally got [n] * /

Return Value;

static inline Elf32_Addrelf_machine_plt_value (struct link_map * map, const Elf32_Rela * reloc, Elf32_Addr value) {return value reloc-> r_addend;}. / * Fixup a PLT entry to bounce directly to the function at VALUE * / static inline voidelf_machine_fixup_plt (struct link_map * MAP, Const Elf32_rel * reloc, ELF32_ADDR * RELOC_ADDR, ELF32_ADDR VALUE) {* reloc_addr = value;}

Reference:

1.Glibc 2.1.3 SRC2. << ELF file format >> 3. << Cheating the elf Subversive Dynamic Linking to Libraries >> Write by the grugq4.linux dynamic link technology http://www.linuxforum.net/forum/ showflat.php? Cat = & Board = Kstudy & Number = 102793 & page = 1 & view = collapsed & sb = 5 & o = 31 & part = 5.p58-0x04 by Nergal << The advanced return-into-lib (c) exploits> >

转载请注明原文地址:https://www.9cbs.com/read-105327.html

New Post(0)