Coff file format

zhaozj2021-02-11  193

Coff format

COFF - CommON Object File Format, is a very popular object file format (Note: Here it is a "target" file, is the target file generated by the compiler (* .o / * .Obj) The difference because this format is not only used for target files, library files, and executables are often this format). Can you use VC often often? The target file (* .Obj) it produces is this format. Other compilers, such as GNU Compiler Collection, ICL (Intel C / C Compiler), Vectorc, also using this formatted target file. More than just C / C , many other languages ​​also use this format object file. The goal of the unified format has brought great convenience to mixed language programming.

Of course, it is not only this object file format. There is also an OMF-object model file (Object Module file) and EXECUTABLE AND LINKING FORMAT. OMF is a format of a large group of IT giants in N years, very common on the Windows platform. The target file that everyone likes to use Borland is this format. MS and Intel are also using this format in N-year, and now they are all changed to the heterogeneous side, and it is formatted in Coff. The ELF format is used in non-Windows platforms, and there is basically not seen in the Windows platform. As a programmer, it is necessary to meet these guys who are often dealing with! But this time, let me introduce Coff first!

Coff's file structure let's take a look at the overall structure of the COFF file, see what it is as long!

File HEADER OPTIONAL HEADER Section Header 1 ... Section Header N Section Data Relocation Directives Line Numbers Symbol Table String Table As the left: COFF files have 8 kinds of data, from top to bottom: 1. File header ( File header 2. Optional header 3. Section header 4. Section data 5. Relocation Directives 6. Line Numbers 7. Symbol Table Symbol Table 8. String Table

Among them, in addition to the paragraph heads, there are multiple sections (because there are multiple paragraphs), all of the other types of sections can only have one.

Document Head: As the name suggests, it is the head of the COFF file, which is used to save the basic information of the COFF file, such as file identification, location of each table, and more.

Optional head: Remember, it is also a head, or optional, and can be available. In the target file, there is basically nothing; but in other files (such as: executable) This segment is used to save the information not described in the file header.

Paragraph: Another look ... (I don't care, someone wants to hit me J), ​​this head (how so many heads?!) Is used to describe paragraph information, each paragraph has a paragraph description. The number of paragraphs will point out in the file header.

Paragraph Data: This is usually the largest data segment in the COFF file, and each paragraph real data is saved in this position. As for how to distinguish these data, don't ask me, ask for a long time.

Relocking table: This table usually exists in the target file, which is used to describe the relocation information of the symbols in the COFF file. As for why it is relocated, please go home to see your operating system book.

Symbol table: This table is used to hold information on all symbols used in the COFF file, and when connecting multiple COFF files, this table helps us relocate the symbol. It is also necessary to use it when debugging.

String table: Don't tell me, everyone knows that it is used to save strings. But who is saved by the string? Don't know! ? Ask me!

The J Sign Table is to describe the symbolic information in the form of a record, but it only has 8 characters for the symbol name, and the early applet will be able to do it, but in the current program, a symbol is not moving. Dozens of characters, how can 8 characters? No way, I have to exist these names in the string table. Only the position of these strings is only recorded in the symbol table.

The structure of the file is generally the same. Long is ugly, but it is still a bit visual. The expandable design is well designed to take advantage of it. Understand the overall structure of the document, now let us analyze it by paragraph.

File header

The file header, naturally starting from the 0 offset of the file, its structure is simple. The structure with C is described as follows:

Typedef struct {

UNSIGNED short usmagic; // magic number

Unsigned short usnumsec; // paragraph (section)

Unsigned long Ultime; // Timestamp

Unsigned long ulsymboloffset; // symbol table offset

Unsigned long ulnumsymbol; // symbol number

Unsigned short usopthdrsz; // Optional head length

UNSIGNED short usflags; // file tag

} Filehdr;

The USMAGIC member is a magic number, which is 0x014c in the COFF file on the i386 platform. If the magic number in the Coff file is not 0x014c, then you don't have to look, this is not a Coff file for an i386 platform. In fact, this is a platform logo.

The second member USNUMSEC is an unsigned short integer, which is used to describe the number of paragraphs. The number of section headers is it.

Ultime members are a timestamp that is used to describe the establishment time of the COFF file. This timestamp is often used as an encrypted alignment identifier when the COFF file is an executable file.

Ulsymboloffset is the offset of the symbol table in the file, which is an absolute offset, and counts from the file header. In other sections of the COFF file, this offset also exists, which are absolute offset.

UlnumSymbol members give the number of symbolic records in the symbol table.

USOPTHDRSZ is the length of the optional head, usually it is 0. The type of elective head is also known from this length. For different lengths, we should choose different ways.

USFLAG is the properties tag of the COFF file, which identifies the type of COFF file, the data saved in the COFF file, etc. Information. The value is as follows:

Value Name Description 0x0001 F_Relflg has no redistribution information tag. This tag indicates that there is no relocation information in the COFF file. This tag is usually 0 in the target file, 1 in the executable file. 0x0002 f_exec executables. This tag indicates that all symbols in the COFF file have parsed, and the COFF file should be considered an executable file. All line numbers in the 0x0004 f_lnno file have been removed. The symbol information in the 0x0008 f_lsyms file has been removed. 0x0100 F_AR32WR Some tags indicate that the file is a 32-bit Little-Endian Coff file.

Note: Little-Endian, not allowing its Chinese name. It refers to the array of data. For example, the hexadecimal 0x1234 is 0x34 0x12 in the order in the Little-Endian mode in memory. In contrast to it is BIG-Endian, in which mode in memory is 0x12 0x34.

The content of this table is not comprehensive, but in the target file, only these are only used. Other tags I will give it to the PE format later.

Elective head

Optional head is connected behind the file head, which is starting from the 0x0014 offset from the COFF file. The length can be 0. Different lengths of optional heads, their structure is also different. The standard elective head length is 24 or 28 bytes, usually 28. Here I only introduce the elective head of 28 lengths. (Because the length of this head is customized, the results of different people definition are different, I can only choose a most commonly used head to introduce, others don't know)

The structure of this head is as follows:

Typedef struct {

UNSIGNED short usmagic; // magic number

Unsigned short usversion; // version identity

Unsigned long ultextsize; // body (text) segment size

Unsigned long ulinitdataz; // has initialized data segment size

Unsigned long uluninitdataz; // Not initialized data segment size

Unsigned long ulentry; // Entrance point

UNSIGNED Long UltextBase; // Text Segment Attracket

UNSIGNED Long Uldatabase; // Data Section Bişpto (only in PE32)

} OPTHDR;

The first member USMAGIC is still a magic number, but this is the value of it should be 0x010b or 0x0107. When the value is 0x010b, it means that the COFF file is a general executable; when the value is 0x0107, the Coff is a ROM image file.

USVersion is the version of the COFF file. UltextSize is the length of the cobalt segment, UlinitDataSz, and UluninitDataSz are initialized data segments and the length of the uninitialized data segment, respectively.

Ulentry is the entry point of the program, that is, the location of the COFF loads a text segment (the value of the EIP register), when the Coff file is a dynamic library, the entry point is the entrance function of the dynamic library.

UltextBase is the base address of a text.

ULDATABASE is a data segment base address.

In fact, in these members, just pay attention to USMagic and ulentry.

Paragraph head

The paragraph head is followed behind the elective head (if the length of the optional head is 0, then it is following the header). It has a length of 36 bytes as follows:

Typedef struct {

Char cname [8]; // Segment name unsigned long ulvsize; // virtual size

Unsigned long ulvaddr; // virtual address

Unsigned long ulsize; // segment length

Unsigned long ulsecoffset; // segment data offset

Unsigned long ulreloffset; // segment resettimetric

Unsigned long ullnoffset; // line number table offset

Unsigned short ulnumrel; // Relocate table length

UNSIGNED Short Ulnumln; // Line Table Length

Unsigned long ulflags; // segment ID

SECHDR;

This head can be an important head, and the final information we have to use is described by it. A COFF file may not be other sections, but the file headers and paragraph heads are essential.

CNAME is used to save the segment name, and the commonly used segment name is .Text, .data, .comment, .bss, etc. The .Text segment is a text section, usually the code segment; .data is the data segment, the data saved in this data segment is the initialized data ;.BSS segment can also be used to save data, but the data here is not Initialization, this section is also a space; .comment segment, see the name, it is also known, it is a comment section, used to save some compilation information, an comment on the COFF file.

ULVSIZE is the size of the segment data loaded during memory. It is only valid in the executable file and is always 0 in the target file. If its length is greater than the actual length of the segment, the plurality of portions will be filled with 0.

ULVADDR is a virtual address when the segment data loads or connects. For executable, this address is relative to its address space. This address is the location of the first byte of the data in the segment when the executable is loaded into the memory. For target documents, this is just a offset of the current position of the segment data when relocating. For the calculation of convenience, the calculation of positioning is simplified, it is usually set to 0.

Ulsize is the actual length of data in segments, that is, the length of segment data, which is determined by it when reading segment data.

Ulsecoffset is the offset of segment data in the COFF file.

Ulreloffset is the offset of the relocation information of the segment. It points to a record of the relocation table.

Ullnoffset is the offset of the line number table of this segment. It points to a record in the line number table.

Ulnumrel is the number of records of relocation information. Starting from the recording pointing from Ulreloffset, the recording of the ULNUMREL is the relocation information of the segment.

Ulnumln and Ulnumrel are similar. However, it is the number of records of the line number information.

ULFLAGS is the properties identifier of this segment. The value is as follows:

Value Name Description 0x0020Styp_text Text Segment ID, Description This section is code. 0x0040 STYP_DATA data segment identifier, some identified segments will be used to save the initialized data. 0x0080 STYP_BSS has this identification section is also used to save data, but the data here is not initialized. Note that in the BSS segment, ulvsize, ulvaddr, ulsize, ulsecoffset, ulreloffset, ullnoffset, ulnumrel, ulnumln value is 0. (The above table is just part of the value, the other values ​​are introduced in the PE format, and then the same)

Segment data

"People" is as good as their name, here is the location of the data saved in each segment. Different types of segments, data content, and structures are not the same. However, in the target file, these data are raw data (RAW DATA). There is no special format.

Relocation table

This table is saved is the relocation information of each segment. This is a big table because all segments are repositioning information in this table. Each paragraph records the offset and quantity of its relocation information. When you use relocation information, you will read it in this table. Of course, you can also view the entire relocation table as multiple relocation tables, each with a reposition table. This table is only in the target file, and there is no such table in the executable file.

Since there is a table, then there will be records. Each recording in the relocation table is a relocation information. This record is very simple, as follows:

Typedef struct {

Unsigned long uLaddr; // Positioning offset

Unsigned long ulsymbol; // symbol

Unsigned short uSTYPE; // Positioning type

}

Is there a simple enough, a total of three members! ULADDR is the content to be positioned to be offset during segment. For example: a text section, the starting position is 0x010, the value of uLaddr is 0x05, then your positioning information is to be written at 0x15. And the length of the information depends on the type of your code, 32-bit code to write 4 bytes, and only the word 2 bytes.

Ulsymbol is a symbol index that points to a record in the symbol table. Note that this is an index, not an offset! It is just a record number in the symbolic table. This member indicates the symbols of the recording information.

USTYPE is the identifier of the relocation type. In the 32-bit code, only two ways are used. One is absolute positioning, and the other is relative positioning. The code is as follows:

Value Name Description 6Reloc_ADDR3232 bit absolutely positioned. 20Reloc_rel32 32 bits relatively positioned. These values ​​are not the same for different processors. Here is given here that the most commonly used in the i386 platform is identified.

The position is as follows:

Absolute positioning

In absolute positioning mode, you have to give an absolute address of the symbol (note, sometimes it may not be the address, but the value, for constants, you don't have to give it the value, just give it a value) . Of course, this address is not ready, you want to use the relative address of the symbol to get its absolute address.

Formula: Symbol absolute address = segment offset symbol offset

These offsets you have to get from the paragraph head and symbol table, respectively. When the paragraph is to be relocated, of course, it is necessary to position the paragraph to locate the symbols.

Relative positioning

Relative positioning is more complicated. The address information it wants is the offset relative to the current location. This current location is the four bytes of the absolute address pointed to by ULADDR (32-bit code is four bytes, 16 bits are two bytes) )s position. That is, using positioning offset current segment offset machine word length ÷ 8

Formula: Current address = positioning offset current segment offset machine word length ÷ 8

With the current address, the relative address is calculated. Just use the absolute address of the symbol to subtract the current address.

Formula: Relative Address = Symbol Absolute Address - Current Address

Calculate the address, write it to the location points to ULADDR, just ok! You have completed the work of relocation.

Line number table

The line number table is useful when debugging. It establishes the encomment relationship between the executable binary code and the source code. In this way, when the program is executed incorrect (in fact, it can be

J), we can learn the line number of the wrong source code according to the current execution code, and then modify it. If you don't have it, the ghost knows which is a problem!

Its format is also very simple. There are only two members as follows: typedef struct {

Unsigned long uladdrorsymbol; // code address or symbol index

UNSIGNED SHORT USLINENO; // Line

Lineno;

Let us look at the second member first, USLineno. This is a counter counting from 1, which represents the line number of the source code. When the first member uLaddRorsymbol represents the address of the source code, when the line number is 0, it is the index of the line number in the symbol table. Let's take a look at the symbol table!

Symbol table

The symbol table is a table for saving the symbol information in the object file, and is the most complex table in the Coff file. All paragraphs are used in this table. It is also composed of many records, and each record is saved as follows:

Typedef struct {

Union {

CHAR CNAME [8]; // Symbol name

Struct {

UNSIGNED long ulzero; // string table identifier

Unsigned long uloffset; // string offset

} e;

} e;

UNSIGNED long ulvalue; // symbol value

Short means; // Sign

Unsigned short uSTYPE; // symbol type

Unsigned char usclass; // Symbol storage type

Unsigned char usnumaux; // Symbol Additional record number

} SYMENT;

The CNAME symbol name is the same as all the previous names, but it is also 8 bytes, but different is in one consortium. And it has the same storage space, there are two members of Ulzero and Uloffset. If the name of the symbol is only 8 characters, it is good, you can put it directly in this cname; At this time, the value of ulzero will be 0, and the name of the symbol we use in the ULOFFSET will give the offset in the string table.

A symbol has a name is not enough, it also has a value! Ulvalue is the value represented by this symbol.

ISECTION members pointed out the paragraphs of this symbol. If it is 0, then this symbol is an external symbol, parsing from other COFF files (connecting multiple target files to resolve this symbol). When its value is -1, the value of this symbol is a constant, not the offset it in the paragraph. When it is -2, this symbol is just a debug symbol, which is only used when debugging. When it is greater than 0, it is the index value of the symbol.

USTYPE is the type identity of the symbol. It is used to explain the type of this symbol is a function? Integer? Still what else. This logo is two bytes.

The low four bits of low bytes are basic identifiers that indicate the basic type of symbols, such as integer, characters, structures, and combiners. High four points indicate the advanced type of symbols, such as pointers (0001b), function (0010b), array (0011b), no type (0000B), etc. The current compiler usually does not use the basic type, only the advanced type is used. Therefore, the basic type of symbol is usually set to 0.

High bytes are usually unused.

USCLASS is the storage type ID of the symbol. It indicates the memory of the symbol.

Its value and meaning are shown in the table:

Value Name Description Null0 has no storage type. Automatic1 automatic type. Usually the variables allocated in the stack. External2 external symbol. When the external symbol, the value of the ISECTION should be 0, if not 0, the ulvalue is the offset of the symbol in the segment. Static3 static type. Ulvalue is the offset of symbols in segments. If the offset is 0, then this symbol represents the paragraph name. Register4 register variable. MEMBER_OF_STRUCT8 Structural members. The ULVALUE value is the order in which the symbol is in the structure. Struct_tag10 structure identifier. MEMBER_OF_UNION11 Union member. The ULVALUE value is the order in which the symbol is in a joint. UNION_TAG12 combined identifier. TYPE_DEFINITION13 Type Definition. Function101 function name. File102 file name. The last member USNUMAUX is the number of additional records. Additional records are some additional information used to describe symbols, in order to facilitate saving, these additional records typically select an integer multiple of a symbolic information record (mostly 1). So, if the value of this member is 1, then a record is added to save additional information after the current symbolic information record is used.

The structure of additional information is related to the type of symbol and the storage type. Different types of symbols, the structure of its additional information (if any) is different. If you don't care about these, you can also put them.

When the type of type is file, the additional information is a string, which is the name of the target file corresponding to the source file. Other types are discussed in detail when introducing PE.

String table

Don't say more, the blind man can see that this table is used to save strings. It is next after the symbolic table. As for why the string is saved, it has already been said. I will not say more here, just talk about the saving format of the string.

String table is the simplest section in all sections. As shown below:

0 4

String table length string 1/0 .... string N / 0

The first four bytes of the string table are the length of the string table, in bytes. Thereafter, the string (C style string) ends with 0. It should be noted that the length of the string table is not only the length of the string (this length to include the '/ 0' after each string, further includes four bytes of this length domain. The offset indicated by the ULOFFSET member in the symbol table is the offset from the start of the string table. For example: refers to the symbol of each string, the value of ULOFFSET is always 4.

The code given below is a typical C code for reading strings from a string table.

INT istrlen, ICUR = 4; // istrlen is the length of the string table, ICUR is the current string offset

Char * str; // string table

Read (Fn, & Istrlen, 4); // Get string length

Str = (char *) malloc (istrlen); / / Assign space for string table

While (icr

ICUR = Read (FN, STR ICUR, ISTRLEN- ICUR);

ICUR = 4; // Point the current string offset to each string

While (icur

Printf ("String Offset 0x% 04x:% S / N", ICUR, STR ICUR);

ICUR = (Strlen (STR ICUR) 1); // Do not forget 1 byte of the calculation of '/ 0' characters when calculating offset! }

Free (STR); // Release string table space

Until here, the structure of the entire Coff has been fully introduced. Many friends who understand the PE format will be strange, it seems to have a lot less! ? Yes, the standard COFF file has only so much thing. But MS is compatible with DOS's executable and extension of executable file function, adds a lot of its own standards in the COFF format. Let me almost can't recognize Coff. But after understanding the COFF file, then learn the format of the PE file, it is very simple.

Want to know the format of the PE file? There are a lot of information on the Internet, I will write a few articles on the basis of this article, introducing PE, OMF, and ELF format.

Now everyone can do it yourself, write a COFF file parser or a simple connection program!

转载请注明原文地址:https://www.9cbs.com/read-5682.html

New Post(0)