Dynamic links, a topic that is often filed. However, in this regard, there are few articles to clarify this important software operation mechanism, only some articles about dynamic link library programming. This series of articles is to explore this issue from the level of dynamic link library source code.
Of course, you can see the topic of the article, the dynamic link of the Linux ELF file under the Intel platform. One is because of this aspect of information, the second is also the meaning of this discussion is more important than other dynamic links (after all, now is Intel's world). Of course, there is such an example, the dynamic link of the ELF file under other platforms is similar. After reading this article, you can read this article.
Since this is a series of articles, I plan to write three parts, the first part is mainly analyzed, involving the content of the DL_Open, but because this function is too much. Here is the two parts of _dl_map_object and _dl_init, because here is a special initialization in the _dl_init by mapping the dynamic link file to the memory space through the information in the ELF file, and _dl_init is a special initialization. This is achieved for object-oriented functions.
The second part I will analyze the function analysis and uninstallation, which will be more, but there will be more content. The first is the two function contents involved in the _dl_map_object_deps and _dl_bit_object_deps and _dl_relocate_object in DL_Open, because these are directly related to the content parsed by the function, so arrange it here. The following function parsing process _dl_Runtime_Resolve is a dynamic parsing process in the program run. It doesn't have much code from essentially, but its skill is the most (it is the core of my three articles). Finally, it is an implementation of DL_Close. Here is an ending work, which is the error exception processing of _dl_signal_cerror, with _dl_catch_error.
The third part will give the INJECTSO instance analysis and application, which will introduce an instance applying a dynamic link, and can use the INJECTSO instance used during future program debugging, which can not only let us have the previous dynamic link A more sensual understanding, and for this example, you can also use a dynamic patch tool in the future code development process, and even possibly, I will use this tool in the later article. Technology.
First, historical problem
With regard to dynamic link, it can be said that it has been said. If you trace, the earliest thought is in the fifth year. At that time, I wanted to put some public code in one place in memory, and it was Call in other addresses. Later, it has developed to Loading Overlays (that is, the code that is different in the program running the life period is added to the memory), which is in the 1960s. But this can only be considered a "abuse" period. Close to the dynamic link we now say is after UNIX operating system, because from UNIX design structure, it is divided into modules to implement a complex functional operating system. But these are not dynamic links in the modern sense, because the dynamic link in the modern sense should meet two features:
1. Dynamic loading, is when this running module is mapped into the virtual memory space of the running module when needed, such as a module to use the Myget function in MyLib.so in the run, and there is no MYLIB .so before the other functions in this module, do not load this module into your program (that is, memory mapping), these contents are implemented in the kernel, using the page exception mechanism (I may be in another article) This issue is mentioned).
2, dynamic analysis, is when the function to be called is called, will then resolve this function in the start address of the virtual memory space, and then write to the storage address specifically in the call module, such as the front. Said that you have called Myget, so the MYLIB.SO module must have been mapped into the program virtual memory, and if you call MYLIB.SO's Myput function, then its function address will be called Parse it. (Note: The program used here is the general process process, and the module may be the binary code of your program, or it may be another shared link file that is dependent on your program ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ELF format.)
In these two points, it is a bit like the operation of memory in the current operating system, that is, only virtual space mapping is performed when you want to use a memory space, not prematurely map all spaces, The physical space is only allocated when it is read from this memory space. This is a bit like the first. There is only a COW (Copy ON WRITE) only when writing this memory space. This is a bit like the second.
This kind of benefit is to fully avoid unnecessary overhead. Because any program is running, it is impossible to use all call functions.
Such ideological methods are proposed and implemented in the system of SunOS's SunOS in the 1980s.
For this history, please see the information [1].
ELF binary format files and modern dynamic link ideas are roughly formed at the same period, and its source is A.out two in AT & T's earliest in the file format. Bell Labs staff In order to adapt this new software and operating system requirements (such as AIX, SunOS, HP-UX, the UNIX variant of the AIX, SunOS, HP-UX, the extension requirements of the broader application For support for face-to-object, the ELF file format is invented.
I don't discuss the specific details of the ELF file here. This can be written in a very long article. You can get information [2] to get Abi (Application Binary Interface specification). However, the hierarchical management method used in the ELF file did not only play an important role in dynamic links, but this idea can be said to be the oldest and most classic ideas in our computer.
For each ELF file, there is an Elf Header, every header here has two data members, that is
ELF32_OFF E_PHOFF;
ELF32_OFF E_SHOFF;
They represent the offset of Program HEADER and Section Header in the ELF file. Program Header is a primary, and Section Header is the first small purpose.
ELF32_ADDR SH_ADDR;
ELF32_OFF SH_OFFSET;
SH_ADDR This section is a map address in memory (for dynamic link libraries, this is a relative amount, which forms an absolute address with L_ADDR loaded by the entire ELF file). SH_OFSET is the offset of this section header in the file.
It is said that this is to use Elf Header to manage the entire ELF file with Elf Header:
For example, if you want to find the corresponding function start address according to the known function name, the process is like this.
First, find the offset E_phoff of the file from the previous ELF, in this, find the PHDR of Pt_Dynamic's D_TAG, find the DT_Dynamic section from this address, and finally find such an ELF32_SYM structure, its ST_NAME The string of refers to consistent with a given name, and use ST_VALUE. This management mode can be said to be very complicated, sometimes it looks cumbersome. If you find a Function start address, you need four steps from Elf Header >> Program HEADER >> Symbol Section >> Function AddRess. But the fundamental reason here is that our computer is linearly addressed, and Feng * Nobiman is related to the computer architecture, so it is an old idea. But it is also due to such an ELF file structure, which is very conducive to the expansion of the ELF file. We can imagine that if one day, our ELF file is encrypted for some reason. At this time, if you want to save the key in the ELF file, you can open a special section encrypt in the ELF file. This section is ST_ENCrypt, that isn't it okay? This can be seen that the ELF file format designer is the first pain (now this is really such a section).
Second, the code
So much, I haven't truly mentioned the loading and call of the Linux dynamic link library under the Intel 32 platform. In general, the program we have written is done by the compiler with ld.so this dynamic link library. And if you want to explicitly call the program in a dynamic link library, the following is an example.
#include
#include
Main ()
{
Void * libc;
Void (* printf_call) ();
Char * error_text;
IF (libc = DLOPEN ("/ lib / libc.so.5", rtLD_LAZY))
{
Printf_call = DLSYM (libc, "printf");
(* printf_call) ("Hello, World / N");
DLClose (libc);
Return 0;
}
Error_Text = DLERROR ();
Printf (Error_Test);
Return -2;
}
Here, use DLOPEN to open a dynamic link library file, and this process is much more than what we see here, I will use a lot of space to illustrate this, and the parameters it returns is a pointer. Specifically Struct Link_Map *, and DLSYM is the address of this function in this process together in this struct link_map * with the function name, which is the function resolution. The last DLCLOSE is released from the resources you just got in DLOPEN. This process is probably the same as our Loaded Share Object File Module, the program in the kernel, but here is in the user state, and that is in the kernel state. The complexity of the function is more complicated (finally a point to explain, if you want to compile the above file ------- File name If you are Test, you can't use the general GCC -O Test Test. c, but should be GCC -C Test Test.c-LDL to compile, because the compiler can not find DLOPEN and DLSYM DLCLOSE these special functions libdl.so.2, -ldl is loading it Sign). Third, _dl_open loading process analysis
This article and the future two articles will be explained by the above procedures. That is, in the way of DLOPEN >> DLSYM >> DLClose, there is a few points to explain: I am here, the source code is from the GLIBC version 2.3.2. However, due to the original code, from the transplantation and robust consideration, there are many prevention errors, with the code about the different platforms, most of which is the error handling code, I will delete these code. And only by the code under Intel 32 platform. Also, here is also taken into account the dynamic link library load in multithreading, this is not included here (not supported in the current Linux kernel). So the code you see, in power to ensure that the dynamic link loading and function resolution have made most deletions, the code amount is only about one quarter, while the original code is maintained, highlight core function. Despite this, there is still a code of up to 2000 rows, please understand patiently. I will also make a detailed description of the possible difficulties. Let everyone truly understand the true meaning of code design and dynamic analysis.
The first function is in DL-OPEN.C
2672 void * Internal_Function
2673 _dl_open (const char * file, int mode, const void * caller)
2674 {
2675 STRUCT DL_OPEN_ARGS ARGS;
2676
2677 __RTLD_LOCK_LOCK_RECURSIVE (GL (DL_LOAD_LOCK);
2678
2679 arggs.file = file;
2680 args.mode = mode;
2681 args.caller = Caller;
2682 args.map = null;
2683
2684 DL_Open_Worker (& ARGS);
2685 __RTLD_LOCK_UNLOCK_RECURSIVE (GL (DL_LOAD_LOCK);
2686
2687}
The internal_function here is that this function passes the parameters from the register, and its definition is obtained in configure.in.
# Define INTERNAL_FUNCTION __ATTRIBUTE__ ((RegpArm (3), stdcall)) The regParm is the GCC's compilation option is to pass three parameters from the register, and stdcall indicates that this function is made by calling function, and the general function is The caller is responsible for clearing, using CDECL. __rtld_lock_lock_recursive (GL (dl_load_lock)); and __rtld_lock_unlock_recursive (GL (dl_load_lock)); now has not been fully defined, at least in linux is not, but can refer linux / kmod.c in order to prevent excessive embedded in request_module Set one lock.
And other content is a package.
DL_OPEN_WORKER is a real dynamic link library map and constructs a struct link_map. This is an absolutely important data structure. It is defined because it is too long, I will introduce in the appendix of the end of the second article, because you can look back Understand the process of dynamic link library loading and parsing, and there is a practical explanation in the specific functions below, and we will see it below:
_dl_open () >> DL_Open_Worker ()
2532 static void
2533 DL_Open_Worker (void * a)
2534 {
.......................... ...
2547 args-> map = new = _dl_map_Object (null, file, 0, lt_loaded, 0, mode);
Here is to call _dl_map_Object to map files to memory. The original function is to search for dynamic link library files from different paths, but also to Soname (this is the alias of the dynamic link library file in runtime), I have been deleted here.
_DL_Open () >> DL_Open_Worker () >> _dl_map_object ()
1693 STRUCT LINK_MAP *
1694 INTERNAL_FUNCTION
1695 _dl_map_object (struct link_map * loader, const char * name, int prelined,
1696 INT TYPE, INT TRACE_MODE, INT MODE
1697 {
1698 INT FD;
1699 Char * RealName;
1700 char * name_copy;
1701 struct link_map * L;
1702 struct filebuf fb;
1703
1704
1705 / * Look for this name among Those already loaded. * /
1706 for (L = GL (DL_Loaded); L; L = L-> L_Next)
1707 {
1708 if (! _DL_NAME_MATCH_P (NAME, L))
..............
1721 RETURN L;
1722}
1723
1724 fd = open_path (name, namelen, preloaded, & env_path_list,
1725 & RealName, & FB);
1726
1727 l = _dl_new_object (name_copy, name, type, loader);
1728
1729 return _dl_map_object_from_fd (Name, FD, & FB, RealName, Loader, Type, MODE); 1730
1731
1732} / * end of _dl_map_Object * /
Here is a search in the chain of a dynamic link that has been loaded, and it is a thing in the 1706 and 1721 line. It is also very simple to think of because there may be several dynamic link libraries on an executable file. There are several dynamic link libraries that may depend on the same dynamic link file, which may have loaded such a dynamic link library, which is the case.
The following Open_PATH is a key. There are several ways to be env_path_list. One is in the system environment variable, and the second is the string in the section referred to DT_Runpath (see the appendix), and more complex, It is the environment variable obtained from other dynamic link libraries to load this dynamic link library ------- These issues are not explained.
_dl_open () >> DL_Open_Worker () >> _dl_map_object () >> open_path ()
1289 static int open_path (const char * name, size_t namelen, int preloaded,
1290 struct r_search_path_struct * SPS, Char ** RealName,
1291 Struct FileBuf * FBP)
1292
1293 {
1294 STRUCT R_SEARCH_PATH_ELEM ** DIRS = SPS-> DIRS;
1295 char * BUF;
1296 INT FD = -1;
1297 const char * current_what = NULL;
1298 int ANY = 0;
1299
1300 buf = alloca (max_dirnamelen max_capstrlen namelen);
1301
1302 DO
1303 {
1304 STRUCT R_SEARCH_PATH_ELEM * THIS_DIR = * DIRS;
1305 size_t buflen = 0;
..................
1310 STRUCT STAT64 ST;
1311
1312
1313 EDP = (char *) __mempcpy (buf, this_dir-> dirname, this_dir-> dirnamelen);
1314 for (CNT = 0; fd == -1 && CNT 1315 { 1316 / * Skip this Directory if We know it does not exist. * / 1317 IF (this_dir-> status [cnt] == nonexisting) 1318 Continue; 1319 1320 buflen = ((char *) __mempcpy (__mempcpcpcpcpy (_Mempcpcpcpcpcpy (EDP, Capstr [CNT]. Str, 1321 Capstr [CNT] .le, Name, Namelen) - BUF); 1322 1323 1324 fd = Open_Verify (buf, fbp); 1325 1326 1327 __XSTAT64 (_STAT_VER, BUF, & ST); 1328 1329 1341} 1342 .............. 1358} The ALLOC above this is a function of allocating space on the stack, so you don't have to worry about the case where memory leakage occurs during the end of the function (good programmer really wants to assign memory to the heart). 1313 is the DirName Copy of R_Search_Path_elem, while the contents of the 1320 to 1321 lines are to add the last '/' path separation number for this path, and the Capstr is based on the path separation number obtained by different operating systems and systems. This is actually a good example, because the parameters returned by __memcpy are the last byte of the dest string, so it will get a new address after each copy, if you write with Strncpy, Such method STRNCPY (EDP, CAPSTR [CNT]. Str, Capstr [CNT] .len); EDP = Capstr [cnt] .le; STRNCPY (EDP, Name, Namelen); EDP = Namelen; BUFLEN = EDP-BUF; This is to use four sentences, and you can use it here. The following Open_Verify is the file name that opens this BUF, and the FBP is the content of 1024 bytes from this file, and checks the validity of the file. The most important thing is the Elf_Imagic check. If successful, a file descriptor greater than -1 is returned. The entire open_path completed the way to open the file. _dl_new_object is an allocated struct link_map * data structure and populates some of the most basic parameters. _DL_Open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_new_object () 2027 STRUCT LINK_MAP * 2028 INTERNAL_FUNCTION 2029 _DL_New_Object (Char * RealName, Const Char * Libname, Int Type, 2030 STRUCT LINK_MAP * LOADER) 2031 2032 { 2033 Struct Link_map * L; 2034 int IDX; 2035 size_t libname_len = strlen (libName) 1; 2036 Struct Link_map * New; 2037 struct libname_list * newname; 2038 2039 New = (Struct Link_map *) Calloc (Sizeof (* New) Sizeof (* NewName) 2040 libName_len, 1); 2041 .................. 2046 2047 new-> l_name = realname; 2048 new-> l_type = type; 2049 new-> l_loader = loader; 2050 2051 new-> l_scope = new-> l_scope_mem; 2052 new-> l_scope_max = sizeof (new-> l_scope_mem) / sizeof (new-> l_scope_mem [0]); 2053 2054 IF (GL (DL_Loaded)! = Null) 2055 { 2056 l = GL (DL_Loaded); 2057 While (l-> l_next! = Null) 2058 l = l-> l_next; 2059 new-> l_prev = L; 2060 / * New-> l_next = null; Would Be Necessary But We Use Calloc. * / 2061 L-> L_Next = New; 2062 2063 / * Add The global scope. * / 2064 new-> l_scope [idx ] = & gl (dl_loaded) -> l_searchlist; 2065} 2066 else 2067 GL (DL_LOADED) = NEW; 2068 GL (DL_NLOADED); .......... 2080 2081 RETURN NEW; 2082 2083} Memory allocation at 2039 is a policy that is also allocated with the data structure of libName and Name, which is a policy of zero use. From 2043-2053, it is assigned to member data for Struct Link_Map. From 2054-2067, the new struct link_map * is added to a single-strand, which is useful in the future, because if this is integrally manages its related dynamic link library in one execution file, it can be Single chain traversal. If the dynamic link library to be loaded is not mapped to the virtual memory space being mapped to the process, it is just ready to work, the real point begins in the _dl_map_object_from_fd (). Because after this, each step is required for the dynamic link library to play its role in the process. This is relatively long, so it is segmented. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1391 STRUCT LINK_MAP * 1392 _dl_map_object_from_fd (const char * name, int fd, struct filebuf * fbp, 1393 Char * RealName, Struct Link_map * loader, int L_Type, 1394 INT MODE) 1395 1396 { 1397 1398 STRUCT LINK_MAP * L = NULL; 1399 Const Elfw (EHDR) * Header; 1400 Const ELFW (PHDR) * PHDR; 1401 Const ELFW (PHDR) * pH; 1402 size_t maplength; 1403 int Type; 1404 STRUCT STAT64 ST; 1405 1406 __fxstat64 (_STAT_VER, FD, & ST); .......... 1413 for (l = GL (DL_Loaded); L; L = L-> L_Next) 1414 if (l-> l_ino == st.st_ino &&l-> l_dev == st.st_dev) 1415 { ........ 1418 __close (fd); ............... 1422 Free (RealName); 1423 Add_name_to_Object (L, Name); 1424 1425 RETURN L; 1426} I will start again from now, if I find a Struct Link_map * to load the libName to load, the basis of it is its with ST_ino, this is the physical file number in memory, and the file number ST_DEV The same, this is a comparison of the file from the comparative underlying, specific reasons, you can see "Implementation from Linux Memory Management"). The reason why it takes this again, because if the process has to start opening the dynamic link library file, walking here may have to go through a long time (according to my experiment, the file opened for the first time is probably 200 milliseconds --------- The main time is the hard disk's search and reading, but this is already a long time for the computer.) So, there may be other threads have read I entered this dynamic link library, so there is no need to do it anymore. This is consistent with the idea used by the kernel's opening file. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1427 1428 / * this is the elf header. We read it in `Open_verify '. * / 1429 Header = (void *) FBP-> BUF; 1430 1431 L-> L_ENTRY = header-> e_entry; 1432 TYPE = header-> e_type; 1433 L-> l_phnum = header-> e_phnum; 1434 1435 maplength = header-> e_phnum * sizeof (ELFW (PHDR)); 1436 This paragraph is made into the following ELF files to make a point preparation (to read and write the array of PHDR). _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1438 / * Scan The Program Header Table, Collecting ITS Load Commands. * / 1439 Struct LoadCMD 1440 { 1441 ELFW (AddR) MapStart, Mapend, DataEnd, Allocend; 1442 OFF_T MAPOFF; 1443 int prot; 1444} LOADCMDS [L-> L_phnum], * C; 1445 size_t nloadcmds = 0; Here, the data structure is defined inside the function, ensuring that this is a local variable definition, which is the same as the effect of the Private in the object. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1448 for (pH = phdr; pH <& phdr [l-> l_phnum]; pH) 1449 Switch (pH-> p_type) 1450 { ........ 1454 case pt_dynamic: 1455 L-> L_LD = (void *) pH-> p_vaddr; 1456 L-> L_LDNUM = Ph-> p_memsz / sizeof (ELFW (DYN)); 1457 Break; 1458 1459 case pt_phdr: 1460 L-> L_phdr = (void *) pH-> p_vaddr; 1461 Break; 1462 1463 Case Pt_Load: ............ ..... 1467 C = & loadcmds [nloadcmds ]; 1468 c-> mapstart = ph-> p_vaddr & ~ (pH-> p_align - 1); 1469 c-> mapend = ((pH-> p_vaddr ph-> p_filesz gl (dl_pagesize) - 1) 1470 & ~ (GL (DL_PageSize) - 1)); 1471 C-> DataEnd = ph-> p_vaddr p_filesz; 1472 C-> Allocend = ph-> p_vaddr ph-> p_memsz; 1473 C-> Mapoff = ph-> p_offset & ~ (pH-> p_align - 1); ............ ..... 1480 C-> Prot = 0; 1481 IF (Ph-> P_Flags & Pf_r) 1482 C-> Prot | = prot_read; 1483 IF (Ph-> P_Flags & Pf_w) 1484 C-> Prot | = prot_write; 1485 IF (Ph-> P_Flags & Pf_x) 1486 C-> Prot | = prot_exec; 1488 Break; .......... 1493} In the specification of the ELF file, depending on the Different Program HEADER, different functions are implemented, and different processing strategies are used, please refer to the instructions in Appendix 2. There is no general default but actual operation with the following statement is equivalent: DEFAULT: CONTINUE; It's really a simple program. However, there is a special point that PT_LOAD, and all the loaded sections are built in loadCMDS in loadCMDS, it is a good idea. In particular, the pointer is used, it is worth learning (1467 c = & loadcmds [nloadcmds ];). _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1498 c = loadingcmds; .......... 1501 maplength = loadingcmds [nloadcmds - 1] .allocend - c-> mapstart; 1502 1503 IF (__builtin_expect (type, et_dyn) == et_dyn) 1504 { .............. 1521 L-> L_map_start = (ELFW (AddR)) __mmap ((void *) 0, Maplength, 1522 C-> Prot, Map_copy | map_file, 1523 fd, c-> mapoff; 1524 1525 l -> l_map_end = l_map_start maplength; 1526 l_map_start - c-> mapstart; ........ 1535 __MPROTECT ((CADDR_T) (L-> L_ADDR C-> Mapend), 1536 LoadCmds [NLOADCMDS - 1] .allocend - c-> mapend, 1537 prot_none; 1538 1539 goto postmap; 1540} Between the 1521-1526 lines, the entire document is mapped, and the 1498 lines and 1501 lines are calculated, and the contents of the two PT_LOAD Program Headers of the head are calculated. The 1503 line is our scenario here, because this is the loading of the dynamic link library. The attributes of the modification of the modification of the virtual memory of the 1535 line are the blank failure of the mapping on the highest address. This is a protection. In order to prevent some people from making an article here. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1546 While (c <& loadcmds [nloadcmds]) 1547 { 1548 1549 Postmap: 1550 IF (l-> l_phdr == 0 1551 && (ELFW) C-> Mapoff <= header-> e_phoff 1552 && ((SIZE_T) (C-> Mapend - C-> MapStart C-> Mapoff) 1553> = header-> e_phoff header-> e_phnum * sizeof (ELFW (PHDR)))))) ...... 1555 l-> l_phdr = (void *) (C-> MapStart Header-> E_PHOFF - C-> Mapoff); 1556 1557 IF (C-> Allocend> C-> DataEnd) 1558 { ........ 1561 ELFW (AddR) ZERO, ZEROEND, ZEROPAGE 1562 1563 ZERO = L-> L_Addr C-> DataEnd; 1564 zeroend = l-> l_addr c-> allocend; 1565 zeropage = ((ZERO GL (DL_PageSize) - 1) 1566 & ~ (GL (DL_PageSize) - 1)); 1567 1568 IF (ZeroEnd ........ 1571 zeropage = zeroend; 1572 1573 IF (Zeropage> Zero) 1574 { ....... 1576 IF ((C-> Prot & Prot_Write) == 0) 1577 { 1578 / * DAG NAB IT. * / 1579 __MPROTECT ((CADDR_T) (ZERO & ~ (GL (DL_PageSize) 1580 - 1)), GL (DL_PageSize), 1581 C-> Prot | Prot_Write) <0); 1582 1583} 1584 MEMSET ((void *) zero, '/ 0', zeropage - zero); 1585 IF ((C-> Prot & Prot_Write) == 0) 1586 __MPROTECT ((Caddr_t) (ZERO & ~ (GL (DL_PAGESIZE) - 1), 1587 GL (DL_PageSize), C-> Prot); 1588} 1589 1590 IF (ZeroEnd> Zeropage) 1591 { ...... .. 1593 CADDR_T Mapat; 1594 mapat = __mmap ((CADDR_T) Zeropage, ZeroEnd - Zeropage, 1595 C-> Prot, Map_onon | map_private | map_fixed, 1596 Anonfd, 0); 1597 1598} 1599} 1600 1601 C; 1602} The same as the above phase is modified according to the operation attributes of the file mapped by the PT_LOAD Program HEADER, but when ZeroEnd> Zerorpage is different, map it into the data space exclusive to the process. This is also the place where the general initialization data area BSS is. Because Zeroend is the page of the map in the file, the zeropage is the page pair of content mapped in the file, which is to be prepared for uninitialized data, which is reflected in 1593-1597. To change its attributes to be writable, all is 0. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1606 if (L-> L_phdr == NULL) 1607 { ...... .. 1611 ELFW (PHDR) * NewP = (ELFW (PHDR) *) Malloc (header-> e_phnum 1612 * SIZEOF (ELFW (PHDR)); 1613 1614 L-> L_phdr = Memcpy (newp, phdr, 1615 (header-> e_phnum * sizeof (ELFW (PHDR))))))) 1616 l_phdr_allocated = 1; 1617} 1618 else 1619 / * Adjust the Pt_phdr value by the runtime load address. * / 1620 (ELFW (AddR)) L-> L_phdr = L-> L_Addr; PHDR is also in the management of PROGRAM Header, the general situation is not there, so you have to come. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1625 ELF_GET_DYNAMIC_INFO (L); The function ELF_GET_DYNAMIC_INFO called here is one of the most important one in the loading process, because almost all of the contents of dynamic link management is used in this later, and the L_Info data group to be used here. _dl_open () >> DL_Open_Worker () >> _dl_map_Object () >> _dl_map_from_fd () >> ELF_GET_DYNAMIC_INFO () 2826 static inline void __attribute__ ((unused, always_inline)) 2827 ELF_GET_DYNAMIC_INFO (STRUCT LINK_MAP * L) 2828 { 2829 ELFW (DYN) * DYN = L-> L_LD; 2830 ELFW (DYN) ** Info; 2831 2832 2833 info = l-> l_info; 2834 2835 While (DYN-> D_TAG! = DT_NULL) 2836 { 2837 IF (DYN-> D_TAG 2838 info [DYN-> D_TAG] = DYN; ............... 2853 DYN; 2854} .......... 2858 IF (l-> l_addr! = 0) 2859 { 2860 ELFW (AddR) L_ADDR = L-> L_Addr; 2861 2862 IF (Info [dt_hash]! = Null) 2863 INFO [DT_HASH] -> D_UN.D_PTR = L_ADDR; 2864 IF (INFO [DT_PLTGOT]! = Null) 2865 INFO [DT_PLTGOT] -> D_UN.D_PTR = L_ADDR; 2866 IF (Info [DT_STRTAB]! = NULL) 2867 INFO [DT_STRTAB] -> D_UN.D_PTR = L_ADDR; 2868 IF (Info [DT_SYMTAB]! = Null) 2869 INFO [DT_SYMTAB] -> D_UN.D_PTR = L_ADDR; ................. 2874 .......... 2876 IF (Info [DT_REL]! = Null) 2877 INFO [DT_REL] -> D_UN.D_PTR = L_ADDR; .......... 2879 2880 IF (Info [DT_JMPREL]! = Null) 2881 INFO [DT_JMPREL] -> D_UN.D_PTR = L_ADDR; 2882 IF (Info [Versymidx (DT_VERSYM)]! = Null) 2883 INFO [VERSYMIDX (DT_VERSYM)] -> D_UN.D_PTR = L_ADDR; 2884} .......... 2889} The unused in __ATtribute__ is to eliminate the compiler in -wall to issue a warning for a partial variable that may not be used in the function, and alwayse_inline is very well explained, which is the mandatory sign of the inline function. 2829 lines of L-> L_LD are given 1455 in the front __dl_map_object_from_fd. That is, all the addresses regarding the dynamic link festival (see the explanation in Appendix B). It is obvious that the cycle between the 2835 to 2854 lines is to fill the contents of l_info. This has a big role after this, because these sections can find a function name and positioning information, where the amount here is associated with D_TAG, and the code is simple. 2856 to 2885 is the adjustment process for dynamic link libraries (each section of the adjusted is important to correspond to function parsing, details can be referred to Appendix A), if we think more, in front of the function The 1521 line starts to map the entire file into the memory, which is very good here. If it is not continuous, there is no way to make a unified adjustment here. _dl_open () >> DL_Open_Worker () >> _dl_map_object () >> _dl_map_from_fd () 1662 / * Finally the file information. * / 1663 L-> l_dev = st.st_dev; 1664 l-> l_ino = st.st_ino; 1667 RETURN L; 1670} Finally, the final DL_MAP_OBJECT is completed in the first DL_MAP_OBJECT, and look back at the 1414 line search for the files that have been loaded, you can understand the role here. Go back to DL_Open_Worker _dl_open () >> DL_Open_Worker () 2550 / * it is already open. * / 2551 if (new-> l_searchList.r_list! = Null) 2552 { ....... 2556 IF ((Mode & RTLD_GLOBAL) && new-> l_global == 0) 2557 (void) add_to_global (new); 2558 2559 / * Increment Just The Reference Counter of the Object. * / 2560 new-> l_opencount; 2561 2562 return; 2563} This is why the L_OpenCount is added to the L_OpenCount. But why do you have to make this judgment after 2551 lines, it is related to the code below, _dl_map_Object_Deps will load L_SearchList to load. _dl_open () >> DL_Open_Worker () 2565 / * Load That Object's dependencies. * / 2566 _dl_map_object_deps (new, null, 0, 0, mod & __rtld_dlopen); ............... 2573 L = New; 2574 While (L-> L_Next) 2575 l = l-> l_next; 2576 While (1) 2577 { 2578 if (! L-> l_relocated) 2579 { 2580 _dl_relocate_Object (l, l-> l_scope, lazy, 0); 2581} 2582 2583 if (l == new) 2584 Break; 2585 l = l-> l_prev; 2586} Here _dl_map_object_deps will populate l_searchlist.r_list, for this function and the following _dl_relocate_object due to a larger correlation relationship with the function, I am placed in the "Dynamic link of the ELF file dynamic link under Linux under" Intel Platform (Analysis and instance analysis) Middle) ----------- Function analysis and unloading article "explain. However, this Struct Link_map * that is dependent on this newly loaded dynamic link library is placed in the list of this pointer (that is, l_search_list), _ DL_RELOCATE_OBJECT is the function of the function in this dynamic link library, And here, the reason why WHILE (1) 2576 is because _dl_map_object_deps used in front will also load the dynamic link library dependenh of this dynamic link library, this will be relocated of. _dl_open () >> DL_Open_Worker () 2592 for (i = 0; i 2593 if ( new-> l_searchlist.r_list [i] -> l_opencount> 1 2594 && new-> l_searchlist.r_list [i] -> l_type == lt_loaded) 2595 { 2596 STRUCT LINK_MAP * IMAP = New-> l_searchlist.r_list [i]; 2597 STRUCT R_SCOPE_ELEM ** RUNP = IMAP-> L_Scope; 2598 size_t CNT = 0; 2599 2600 While (* RUNP! = NULL) 2601 { .......... 2605 if (* Runp == & new-> l_searchlist) 2606 Break; 2607 2608 CNT; 2609 RUNP; 2610} 2611 2612 if (* RUNP! = NULL) 2613 / * avoid duplicates. * / 2614 Continue; .......... 2642 IMAP-> L_Scope [CNT ] = & new-> l_searchlist; 2643 imap-> l_scope [cnt] = NULL; 2644} This code is very simple from the implementation function, just in the L_SearchList in our newly added dynamic link library NEW (these are the dependent dynamic link databases loaded in front of DL_Object_Deps) IMAP- > l_scope lookup, if there is RUNP with & new-> l_searchlist, you don't have to expand the original imap-> l_scope, but if you don't have the expansion of 2616 to 2644 lines. But after this background, it is & new-> l_searchlist is actually the new itself. In general, if this dependent dynamic link library is loaded before the New is loaded (specific reason, the next article is described in the Dynamic Link Library Function Analysis), which will encounter this. And we can't guarantee the occurrence of mutual dependence between the two dynamic link libraries, as shown below, the solution here is a remedy. _dl_open () >> DL_Open_Worker () 2647 _dl_init (new, __libc_argc, __libc_argv, __environ); This is the initial function to call the dynamic link library. This is a bit similar to the content of init_module called when INSMOD. As for the __libc_argc, __libc_argv, __libc_argv, __libc_argv, __environ__libc_argv, __environ is running by Bash, and the general dynamic link library is not useful. _DL_Open () >> DL_Open_Worker () >> _dl_init () 1118 void 1119 INTERNAL_FUNCTION 1120 _dl_init (STRUCT LINK_MAP * Main_map, int Argc, char ** argv, char ** ENV) 1121 { 1122 1123 ELFW (DYN) * preinit_Array = main_map-> l_info [dt_preinit_Array]; 1124 ELFW (DYN) * preinit_Array_size = main_map-> l_info [dt_preinit_arraysz]; 1125 unsigned int i; 1126 1127 1128 ELFW (AddR) * AddRS; 1129 unsigned int CNT; 1130 1131 1132 AddRS = (ELFW (AddR) *) (Preinit_Array-> D_un.d_ptr main_map-> l_addr); 1133 for (CNT = 0; CNT
1134 (Init_T) AddRS [CNT]) (Argc, Argv, ENV); .......... 1146 i = main_map-> l_searchlist.r_nlist; 1147 While (i-> 0) 1148 Call_init (main_map-> l_initfini [i], argc, argv, env); 1149 1150 1151 1152 1153} First call the contents of DT_Preinit, which is in the init method in INIT. I think this is to be achieved, not just to make the developer of the dynamic link library have a better development interface, and still perform some initialization work before it relies on the dynamic link library it depends, in view of the object-oriented constructor. _dl_open () >> DL_Open_Worker () >> _dl_init () >> CALL_INIT () 1072 static void 1073 Call_init (Struct Link_Map * L, Int Argc, Char ** Argv, Char ** ENV) 1074 { 1075 1076 if (L-> L_INIT_CALLED) 1078 return; 1079 1082 L-> L_INIT_CALLED = 1; ........ 1089 if (L-> L_INFO [DT_INIT]! = NULL) 1090 { 1091 init_t init = (init_t) DL_DT_INIT_ADDRESS (L, L_> L_ADDR L-> L_INFO [DT_INIT] -> D_UN.D_PTR); 1092 1093 / * Call the function. * / 1094 INIT (Argc, Argv, ENV); 1095} 1098 ELFW (DYN) * INIT_ARRAY = L-> L_INFO [DT_INIT_ARRAY]; 1099 if (init_Array! = Null) 1100 { 1101 unsigned int j; 1102 unsigned int JM; 1103 ELFW (AddR) * AddRS; 1104 1105 JM = L-> L_INFO [DT_INIT_ARRAYSZ] -> D_un.d_val / sizeof (ELFW (AddR)); 1106 1107 AddRS = (ELFW (AddR) *) (init_array-> d_un.d_ptr l-> l_addr); 1108 for (j = 0; j 1109 (INIT_T) AddRS [J]) (Argc, Argv, ENV); 1110} 1111 1112 1113} 1076-1082 The content of the line knows that it is to prevent two initialization. The following is a function call to DT_INIT and DT_INIT_ARRAY, it is worth noting that the previous call call_init is the array of l_initfine, which includes this new dynamic link library. This is done this to complete the DL_Open_Worker () process. At this point, we have recently taken the process of dynamic link libraries (of course, except _dl_map_object_deps and _dl_relocate_object) to now we have understood the following points: 1. The generation and organization of the Struct Link_Map * of the dynamic link library (this is implemented in _dl_new_object) 2, how the dynamic link library is extracted in Struct Link_Map * and is loaded (this is implemented in the three functions of Open_Verify and DL_MAP_Object_FROM_FD, ELF_GET_DYNAMIC_INFO) 3, the initialization process of the dynamic link library itself (this is implemented in _dl_init) Overall function call structure in the figure below. But there are still a few questions that have not been mentioned. 1. The function in the executable is how to locate the function body of the dynamic link. 2. What is the relationship between a dynamic link library and the dependency dynamic link library, how are they contact. 3, how is a function to be dynamically parsed, it is integrated with the function caller and implementation. These issues I will clarify the loading, parsing and instance analysis of the ELF file dynamic link under Linux under the Intel platform. Please look forward to it. Appendix A: Dynamic Link Section Type and Description Type value D_UN Indication EXEC Optional DYN Option Description DT_NULL0 Does Must This means that the end flag of dynamic link section DT_NEDED1D_VAL Optional optional selection D_VAL is a string that is ending with NULL, these strings are this dynamic Link file or executable dependencies and path of the file name with the path of the path DT_PLTRELSZ2D_VAL Optional option here D_VAL is the size of the procedure link table, which combines DT_JMPREL Use DT_PLTGOT3D_PTR optional optional D_PTR It is the start address of the process link table or the global offset table. DT_HASH4D_PTR must have to have D_VAL here to be the start address of the symbol hash table. DT_STRTAB5D_PTR must have to give the start address of the symbolic name string table here. DT_SYMTAB6D_PTR must have to have D_Ptr here, the start address in the ELF32_SYM data structure in the section table. DT_STRSZ10D_VAL must have to have this D_Val is the size of the DT_STRTAB section above. DT_SYMENT11D_VAL must have to have a D_VAL here's size of each ELF32_SYM data structure in dt_symtabs, DT_INIT 12D_PTR Optional Optional Optional D_PTR is the starting address of the initial function that is called when the dynamic link library is loaded. DT_FINI13D_PTR Optional Optional D_PTR is a dynamic link library to call the start address of the deconstruction function when the deconstruction function is called. DT_REL17D_PTR must be optionally similar to the DT_RELA above, which is the start address of the ELF32_REL data structure, which is used in the Intel platform. DT_RELSZ18D_VAL must alternatively, this D_Val corresponds to the above DT_rel, indicating the size of the above section. DT_Relent19D_VAL must be selectable here's D_Val is the size of an ELF32_rel in dt_rel. DT_PLTREL20D_VAL Optional Optional D_VAL is related to the process link table, which is the value of DT_REL or DT_RELA, that is, this ELF file is DT_REL's words that D_Val is 17, and if it is dt_rela, it is 7DT_JMPREL23D_PTR Optional optional This is our most important ELF_DYN here because D_Ptr refers to the GOT (Global Object Table) global object table, which is actually an import function and a global variable address table. DT_INIT_ARRAY25D_PTR Optional Optional D_PTR is the start relative address to initialize the function jump table. DT_FINI_ARRAY26D_PTR Optional Optional D_Ptr is the start relative address of the function jump table called when you want to decompose. DT_INIT_ARRAYSZ27D_VAL Optional Optional D_VAL here shows the size of DT_INIT_ARRAY in front. DT_FINI_ARRAYSZ28D_VAL Optional Optional Optional D_VAL is the size of the DT_FINI_ARRAY in front. DT_Encoding32D_VAL or D_PTR is not specified that there is no specification. Now this section is not specified, but it is obviously prepared for future encryption. DT_PREINIT_ARRAY32D_PTR Alternatively No here D_Ptr is the starting address of the call initial function jump table before calling the main function. DT_PREINIT_ARRAYSZ33D_VAL Optional No D_VAL here is the front DT_Preinit_Array size top only lists the items we want to use here, and the ELF file specification designer also leaves it alone in different systems and platforms. The project is not listed here. Appendix B: Description of the Dynamic Link Library Program HEADER Type Name Value Description PT_Null0 This is the boundary flag of the Program HEADER array. PT_LOAD1 This flag indicates that the content it refers to the content to be loaded into the memory cell, and the loaded content is loaded by the p_offset (offset in the ELF file) p_filesz (the size loaded in the file). The load-loaded requirements are P_VADDR (recommended load address) P_MEMSZ (loaded suggestive size) Pt_Dynamic2 indicates that the Dynamic Section content it corresponds, that is, all ELF32_DYN data structures in Appendix A. Program HEAER PT_Interp3 Here is a string that refers to the dynamic link library name used to load the executable, under Linux, this is /lib/ld-linux.so.2pt_note4 to join the logo for software developers , Indicating the development instructions of the software. PT_SHLIB5 This is reserved for future expansion. PT_phdr6 represents the mapping address and size of Program Header Array itself in memory. Reference [1] John Levine "Linkers and Loaders" (is an overview of the general theory of dynamic links) You can see it on the following URL http://www.iecc.com/linker/ [2] Executable and Linkable Format (ELF) (This introduces a good article in the ELF file format ABI, the web version is available at www.skyfree.org/linux/reference/elf_format.pdf) [3] GLIBC2-3-2 version of this article source code source. It can be downloaded in ftp://ftp.gnu.org. About the Author: Wang Ruichuan Linux lovers, is willing to discuss with like-minded people, contact information jeppeterone@163.com