linkers

xiaoxiao2021-03-06  18

* This article is selected from: COM concentration

Under the hood, july 1997

Author: Matt Pietrek translation: lostall text: Under The Hood: Linkers In-Depth, July 1997 Disclaimer: Strictly speaking, this is not a translation, I prefer to call it "free translation." That is, I tried to ensure that the content of the article is exactly the same as the original text, but I don't guarantee that it is exactly the same as the original expression. Special, for common terms, I am not allowed, or I will be lazy to write Chinese characters, I will directly quote English. If possible, the reader is also to read the original text as the best.

[Abstract] This article introduces the implementation mechanism of Linker, is an unpleasant good article. The two most important tasks of Linker are mainly introduced: (1) Combining sections, (2) Processing fixup and relocation. Also describe how to find symbols from .lib, how to create Import Table, Export Table, etc.

In this column, I usually discuss new or have not been widely used. However, as more and more developers join the ranks of Win32 programmers, some for older hands are outdated, and there is still a mysterious feeling for novices. This topic belongs to this. Before You Visual Basic Program, Be Advised That Visual Basic 5.0 Uses a Linker. In fact, it uses the same linker as VC 5.0. VB 5.0 hides this truth, but if you inspect, it will find that it produces an OBJ file and send them to Microsft Linker. What is a linker? How do it work? I will explain these issues this month. As part of this column research, I found some old resources. Interestingly, I will elaborate here or already out of print or is already on the MSDN CD-ROM, even if the linker technology affects almost every Windows programmer. For this column, I will use Microsoft's Link.exe as a standard Linker. (Other linkers, such as Borland's TLINK32, may have a slight difference with my behavior here) in the later column, I will deepen some more and more interesting switches in MicroSFT Linker. First of all, I will give you a Linker's too simple definition, and improve it later. The work of a linker is to combine one or more target modules (typical is OBJ files) into an execution file (ie, exe or dll). However, this avoids the essence of the problem: What is an Object Module? A target module is an output of a program and digital program that accepts readable text and translating it into a CPU understandable. For C , the C compiler reads a C source file. For assembly languages, an Assembler (such as MASM) reads an assembly language (ASM) file containing the code and data bytes of the CPU used by the CPU (Subject). In VB 5.0, entering files is FRM, BAS, and CLS files in your project. This concept is also applicable to many other languages, such as Fortran. The main component of a target module is machine code and data. The RAW DATA constituting code and data is stored in a consecutive block called section. For example, Microsoft's compiler put their machine code into a section called .Text, putting data in a section called .data. These names have no special meaning, just as a reminder, they are specifically used as specific fees. Other compilers can (and also) use different names for their section. If you are making MS-DOS or 16-bit Windows programming, you can replace "Sections" in the previous narrative to the word "segment", and then the content is still applicable. If you install Visual C in your system, you can use the Dumpbin tool to see Sections in your own OBJ file. Execute the following command: dumpbin

Here

It is the name of any OBJ. Figure 1 lists the most commonly used outlines. By running Dumpbin for an OBJ file, you can see a typical compile of a C program, such as the Chkstk.obj file under the Visual C / Lib directory:

Dump of file chkstk.obj

FILE TYPE: COFF Objectsummary

0.data

2f .text

The special names from the output generated for Compiler or Assembler are a CompiLATION unit (compiler). But most people in us regard it as an obj file. Linker's most important job is to collect all Compilation Units and then merge all Section from Different Compiration Units. Of course, if things are so simple, that linker is just a general program that connects the data block. The complexity of the linker comes from processing fixups. Further details later. You may want to know how the linker decides how to arrange from different OBJ's Code and DATA during the final execution file. The fact is that the linker has a set of carefully designed rules that must be observed. In fact, the task of the linker is very complicated so that it has two scans for its input files. The first pass will make the linker understand what you want to handle. The second pass, the linker applies all rules to generate execution files. Although I don't intend to describe every detail of each rule, there are two rules that have covered most of the links. The main rule is that the linker must put all the Code and Data of each specified OBJ file in the execution file. If you give the linker three OBJ files, the code and data of all the three OBJ files must be merged into the execution file in some way. However, the linker is not a simple extraction of the RAW section of the OBJ file, and then put them together in the execution file together. Instead, the linker merges all the sections with the same name. For example, if the three OBJ files have a .Text section, the generated execution file will have a separate .Text section, consisting of three respective .Text sections, which are connected in the order encountered in the linker.

Figure 2 a.obj, b.obj, and c.obj

Another rule compliant by the linker is determined by the order in which the sections in the execution file is determined by the order of the linker processing section. The linker completes this work by the order of the OBJ file list specified in the command line. However, the main principles of the same name are more preferred. Figure 2 shows three Obj files: a.obj, b.obj, and c.obj. Every file has three sections: there are _text and _data festival, but in different positions of their respective files. They also have a unique festival (ie A_ASM, B_ASM, And C_ASM). Suppose you call LINK, pass the following command line:

A.obj B.Obj C.Obj

The order of the segment (and how the same name is merged) is displayed on the right side of Figure 2. You can download the source file and the OBJ file from the link on the top of this article. This allows you to test different linker command lines, such as "Link B.Obj a.obj C.Obj" - even if you don't have MASM or a compatible Assembler. Remember these two rules, you are in the right direction, know how the linker does its work on MS-DOS and 16-bit Windows. Although the Win32 linker adds some techniques to what I described. For novices, there is a $ section name rule. If the name of a section contains a $ symbol (for example, .idata $ 4), then this $ symbol and all the characters have been lost in the execution file. However, before the linker throws off this name, it merges the name to match the section of the $ symbol. When the OBJ Sections is arranged to execute, the names of the number of symbols are used. These sections are sorted in alphabetical order based on the names behind the $ symbol. For example, there are three festivals called Foo $ C, Foo $ A, And Foo $ B, ready to be merged into a section in the execution file called foo. The data of this section will start from the data of the Foo $ A, followed by Foo $ B, and finally end with FOO $ C. This method of automatic merged names with a $ symbol is used in a wide variety of ways by the linker. You will see an example when I discuss IMPORTED FUNCTIONS. It is also used to create Data Tables that C constructors and destructors are static initialization needed. In addition to the $ merge rule, the Win32 linker has several special situations. Section with code attributes is given a special priority, which is placed in front of the execution file. After the code section, the linit is placed in Uninitialized Data Sections-consisting of global data that is not specified at the time of compression (for example, INT i; is declared in C as a global variable). Then the later is Initialized Data (including .data festival), and the data festival generated by the linker, such as .reloc. Unin-initialized data is usually placed in a section called .bss. But now I have rarely seen the .bss section in an executable file. Microsoft's linker merges the .bss section into the .DATA section, it is used as the primary initialization data section. But wait, this is otherwise! This situation only occurs in a subsystem other than POSIX, and the version of this subsystem is greater than 3.5. Other sections containing unmected data are left separately (ie, they are not merged). Now look back from the back of the execution file, if there is a .debug section in the OBJ file, it is placed in the last side of the execution file. If there is no .debug section, the linker will try to put the .relocistone in the end, because Win32 Loader does not need to read the positioning information in most cases. Reduce the size of the executing program that needs to be read is equal to reducing the time loaded. I will introduce Relocation later. There is also an exception other than two basic rules, that is, REMOVABLE SECTIONS in Win32. These modes are in the OBJ file, but the linker does not copy them into the execution file. These elements are typically available with link_ remove and link_info properties (see Winnt.h) and is named .dRectVE. Microsoft's compiler generates these sections to deliver information to the linker. If you look at a Visual C compile generated obj file, you can see the data in the .DRectVe section may like this: -defaultlib: libc -defaultlib: OldNames

If it looks like a command line parameter that is transmitted to the linker, it is right. Another evidence can be seen when you use C modifiers __DECLSPEC (DLLEXPORT). For example: void __declspec (dllexport) exportme (void) {...}

Will lead to the .dRectVe section also contains the following data:

-Export: _exportme

This is true, if you look at the Linked command line option table, -Export is also one of them.

FIXUPS AND RELOCATIONS

Why can't the compiler create a execution file directly from the source file, don't you need a linker? The main reason is that most programs do not contain only one source file. The compiler is good at generating an equivalent of the original machine code for a single source file. Because a source file may contain a reference to code or data for external files, the compiler does not exactly generate the correct code to call the function or access that variable. Alternatively, the unique selection of the compiler contains additional information describing external code or data in the output file. The terminology used for external references for this description code and data is Fixup. It is directly that the code generated by the compiler is incorrect, and must be patched later. Consider calling a function called Foo in C :

// ...

Foo ();

// ...

The exact byte generated by 32-bit C compiler is as follows:

E8 00 00 00 00

0xE8 is an opcode of the CALL instruction. Subsequent DWORD should include the offset of the foo function (relative to the CALL instruction). Obviously Foo cannot be 0 bytes from the Call instruction. If you want to perform this code, you will never be as you wish. This code is bad and needs to be filed up. In the above example, the linker needs to replace DWORD behind the CALL operator into the correct address of the FOO. In executing files, the linker writes the Foo's relative address to this DWORD. However, how do the linker know what you need to do? A Fixup Record tells it to do this. How do you know where the foo function is? The linker knows all symbols in the execution of the file because it is responsible for arrangement and consolidation of each component of the execution file. It is now discussing those FIXUP RECORDS. For Intel-based OBJ files, only three types of Fixup Record will only be encountered under normal circumstances. The first is 32-bit RELATIVE FIXUP, referred to as REL32 fixup (it corresponds to image_rel_i386_rel32 defined in Winnt.h). In the example of calling the FOO function above, there will be a REL32 FIXUP RECORD, and the linker needs to be overwritten with the appropriate value to overwrite the offset of the DWORD. If you run:

Dumpbin / reelocations

For the OBJ generated on the code above, you will see the output similar to the following:

Symbol Symbol

OFFSET TYPE Applied to Index Name

------------ ---------- ------------

00000004 REL32 00000000 7 _foo

With English, this Fixup Record means that the linker needs to calculate the relative offset of the FOO function, and write this value to the offset 4 byte of this section. Because this fixup record is only used before the linker is created, it will be lost and will not appear in the execution file. Then why most executives contain a section called .reloc? This is the second type of Fixup starts work. Consider the following procedure:

INT I;

int main ()

{

I = 0x12345678;

}

The instructions generated by Visual C in executing the assignment statement are:

Mov DWORD PTR [00406280], 12345678

Interesting is the [00406280] section in the instruction. It references a fixed location in memory and assumes DWORD containing variable I to be loaded with the load address (default 0x400000) on the 0x6280 byte. Now consider what if the execution file is not loaded in the default loading address? For example, Win32 Loader loads it to 2MB of the address of memory (ie, loaded at 0x600000). The [00406280] portion in the instruction should be adjusted to 0x00606280. DIR32 (Direct 32) Fixup is used in this case in the OBJ file. They indicate that the actual (direct) address of a thing needs to be corrected. They also implies the loading address of the execution file. When the execution file is created, Loader proposes Dir32 fixup from the OBJ file and creates .reloc section. However, before this happened, I first run Dumpbin / Relocations, as shown below:

Symbol Symbol

OFFSET TYPE Applied to Index Name

-------- ------------------------

00000005 DIR32 00000000 4 _i

This fixup record means that the linker needs to calculate the direct 32-bit address of the variable _i, and write this value to the offset 5 byte of this section. The.reloc section in the execution file is basically a series of addresses, and the default and actual load addresses are required at this address. By default, the execution file created by the linker, which does not need to be used by Win32 Loader. However, when Win32 Loader needs to load an execution file to a place that is different from its Preferred Load Address, this .reloc section allows all direct references to code and data to be updated. The third typically existed in the Intel Obj file, DIR32NB (Direct 32, No Base) is used as debugging information. One secondary job of the linker is to create debug information, including the name of the function and variable, and their address. Because only the linker knows repaired all the functions and variables final in which, the DIR32NB fixup is used to indicate spots in the debug information where the address of a variable is needed major difference function or. DIR32 and DIR32NB fixup that DIR32NB fixup in The value does not contain the default load address of the execution file.

Libraries

In some cases, two or more OBJ files need to be merged into a separate file, and this file can be then passed to the linker. A typical example is C runtime library (RTL). This C RTL consists of many source files, and the OBJ generated after these files is combined into a library. For Visual C , standard, single-threaded, static version of the runtime library is called libc.lib. There are other versions, for debug (for example, libcd.lib) for multi-threaded (libcmt.lib). The library file usually has a .lib extension. They contain a library header, which is followed by the RAW DATA of the included Obj file. This library header tells the linker which symbols (functions and variables) can be found in the subsequent OBJ, and which OBJ is present in a specified symbol. You can see a library content through the Dumpbin / Linkermember switch. If you specify: 1 ​​or: 2, you will find that Dumpbin's output readability is better, and the reason is not detailed. For example, using the penter.lib in Visual C 5.0 using the following command line: "Dumpbin / Linkermember: 1 penter.lib"

Generated part output:

6 Public Symbols

180 _dumpcap @ 0

180 _StartCap @ 0

180 _stopcap @ 0

180 _Version

180 __mcount

180 __penter

180 in front of each symbol name indicates that this symbol (such as _dumpcap @ 0) can be found in an OBJ file in a 0x180 byte located at the start of the library file. As you can see, Penter.lib only contains an OBJ. More complex lib will include multiple OBJ, which is different in front of the signed name. Unlike the OBJ entered in the command line, the linker is not necessarily to include each OBJ in the library file into the final executter. In fact, the opposite is true. The linker does not contain any code or data in a library file unless at least references to a symbol in the OBJ (the translation note, if a global variable in the lib is referenced, it seems to be in the LIB All other global variables are also placed in the executable, even if they are not in an OBJ, even if they are not referenced). In other words, the specified OBJ is displayed on the command line of the linker and is always included in the execution file. The OBJ file in LIB is alternate, and only when the referenced is included in the execution file. The symbols in a library can be referenced in three ways (therefore, the OBJ it is also included). The first, directly reference this symbol from a explicit OBJ file. For example, if I intend to call the C Printf function from a source file I wrote, then a reference (and a fixup) will be generated in my OBJ file. When you create an execution file, the linker will search its lib file, which OBJ contains the Printf code and contains this OBJ found. (Translation: Not included the entire lib, only the OBJ where the Printf is located), can be an indirect reference. Indirect means a reference to another OBJ symbol in the library file through the first method. This second OBJ may refer to the third OBJ in the library file. A arduous task of the linker is to track and contain an OBJ with a symbolic reference, even if this symbol is 49. When you look for a symbol, the linker searches for the lib file in the order encountered on the command line. However, once a symbol is found in a library, this library has become a priority library, starting from it when all symbols will be found later. Once a symbol is not found in it, this library has lost this feature. In this case, look up the next library in the list of linkers. (About more detailed technical description, see Microsoft Knowledge Base Article Q31998) We now spend the issue of Import Library. In structure, IMPORT LIBRARY and the general library are not distinguished. When the resolution symbols, the linker does not know the difference between IMPORT LIBRARY and a general library. The main difference is the Compiration unit that does not correspond to each OBJ in the Import Library. Instead, the linker generates an Import Library based on the symbol output by the execution file. In other words, when the linker creates an Exports Table in an execution file, it also creates the corresponding IMPORT library to reference those symbols. This is a good transition to my next theme, imports table. Creating the imports TABLE

One of the most basic features of Win32 dependence is to import functions from other execution files. All information about imported DLLs and functions is placed in a table called Imports Table in the execution file. This section is called .idata when it is placed in a section that is only its own section. Because IMPORTS is very important for the Win32 executive, it seems that it is very strange linkile to do anything about IMPORT TABLE. In other words, the linker does not know or does not care whether a function you call is placed in another DLL or in the same executter. It is very clever way to do this. By simply following the mergers and symbol decomposition rules, the linker creates import table, and it doesn't know the special meaning of this table. Let's take a look at some of the partial pieces of Import Library to find out how the linker completes this feat. Figure 3 shows some of the results that run the Dumpbin output for user32.lib. Suppose you have called the ActivateKeyboardLayout function. You can find a FixUp Record for _ActivateKeyboardLayout @ 8 in your OBJ file. From the user32.lib header, the linker determines that this function can be found in the OBJ file of the file offset 0xea14. At this point, the chain pull apparatus confirms that the content to include this OBJ is in the final generated execution file (see Figure 3). From Figure 3, you can see a variety of different sections in the OBJ that will be included, including .text, .idata $ 5, .idata $ 4, and .idata $ 6. In the .Text section is a JMP instruction (the opcode is 0xFF 0x25). From the final Coff Symbol Table of Figure 3, you can see that _ActivateKeyboardLayout @ 8 symbol is parsed into this JMP instruction in the .Text section. Therefore, the linker puts your call to ActivateKeyboardLayout into the JMP instruction within the ket library of the Import Library. The linker combines a set of .idata $ xxx, merges into a separate .idata section in the execution file. Recall the linker must comply with the rules of the festival with the symbol in the merged name. If other import functions are referenced from user32.lib, they will also be mixed together. Idata $ 5 and .idata $ 6. The final result is that all .idata $ 4 builds an array, all .idata $ 6 generates another array. If you are familiar with the terms "Import Address Table", this is the process of this table being created. Finally, pay attention to the RAW DATA of the .idata $ 6 contains string ActivateKeyboardLayout. The name of the Imported Function is to be placed in the Import Address Table. The key points of the problem are that creating import table is not a big event for the linker. It only completed its work based on the rules I have told in front. Creating the exports table

In addition to creating an Import Table for execution files, the linker is also responsible for creating the opposite: Exports Table. Here, the work of the linker is both arduous and easy. When scanned in the first pass, the linker has information that collects all export symbols and creates a task for exporting function tables. In this process, the linker creates an Export Table and writes it to an .edata section in an OBJ file. This OBJ file is standard, except that it uses .exp extension inside. Obj. That's Right, you can use Dumpbin to check the contents of the EXP files, which are generated when you build DLL. In the process of scanning in the second pass, there is very little work. It simply treats EXP as an ordinary OBJ file. This in turn means that in this OBJ will be included in the final execution file. There is no doubt that if you see a .edata section in an executter, it is export table. However, now. Theedata section is less and less. If the executor uses the Win32 Console or the GUI subsystem, the linker seems to automatically merge the .edata section with the .rdata section, if .rdata exists. Wrap Up

Obviously, a linker has more work than I say here. For example, generating a particular type of debugging information (such as a codeview) is a major one in all works of the linker. However, creating debugging information is not absolutely necessary for the linker, so I don't spend time to describe it. Similarly, a linker should also be able to create a MAP file, list all the public symbols included in the execution file, but this is not the functionality of the linker. Although I have involved a lot of complex topics, the linker is essentially a simple tool to merge multiple compilation units into a runoffs. The first element is a combined festival; the second is a reference (fixup) that parses the merged section. Additional knowledge related to system-related data structures, such as Exports Table, and you have involved this powerful and necessary tool foundation.

* This article is selected from: COM concentration

转载请注明原文地址:https://www.9cbs.com/read-48570.html

New Post(0)