Use DelayLoad to optimize the performance of the application. Intercept the API.

zhaozj2021-02-08 373

Translation

Source file http://www.microsoft.com/msj/0200/hood/hood0200.asp Using DELAYLOAD to optimize the performance of the application. Intercept the API.

- Chinese translation, by snake. (Http://snake12.top263.net)

Jeffrey and me in the MSJ publishing publication in December 1998

Write a column for using DelayLoad functions in VC6.

The end result is that it is proved how Cool. However, unfortunately,

There are still many people who don't know DelayLoad, they think this new feature

It is the latest version of Winnt only.

At the beginning, let me reiterate: DelayLoad is not the most

The unique function of the new operating system, it can be in any Win32

It works in the system. I will write a simple example to explain.

DelayLoadprofile, realized a very small feature, many programs

Can benefit from it.

Preview:

Generally, the connector is called when a function in a DLL is called

The DLL and functions will be added to your executable. Finally, all

The referenced function will be placed in the Imports section.

When the program is loaded, the Win32 program loader scans

Each DLL of all IMPORTS segments. Load, and repositioning IMPORTS

All functions of the segment write information into the introduction address table (Import

Address Table, IAT. Simply, IAT is a function

Table of pointer. When calling the introduction function, go to IAT.

So, what is DELAYLOAD? When you are a DLL

When "delayLoad" is made, the connector does not put the original value in

Imports section, in contrast, it is a function of each delayLoad

Names and addresses generate a small root area, backup. the first

When it is quoted, it calls LoadLibrary loading the DLL, then,

It calls getProcAddress to get the address of the function. At last,

Remote yourself in IAT's value so that the future program can be directly

transfer.

The above is a simplified step. In fact, the root area is a small code,

It is connected to the executable in a static manner. Code is in Delayimp.lib

In the middle, you must be referenced by the connection program. And, the code is enough to intelligence.

When a function is first quoted, you want to call LoadLibrary.

The call will not be referenced later.

DELAYLOAD will not add too many time and space than the quote DLL.

This way to call LoadLibrary only cause a little bit of performance

Loss. Each program is launched, positioning the function address for the introduction table

Time, call getProcaddress, phase to DELAYLOAD

For Win32 loaders, the loss of loss can also be ignored.

However, the benefits of delayLoad have also been unparalleled. For example:

If your program never calls imported from Delay, DLL

The first time will not be loaded. Sometimes, this situation has occurred frequencies

I'm imagined. If you have the printed code, there is no

Doubt, even if the user does not use the print function, your program must also add

Load Winspool.drv. In this case, use DelayLoad, you will not

We will load and initialize Winspool.DRV.

Another benefit is: DelayLoad can avoid calling certain goals

API does not exist in the platform. For example, if your program needs to call animateWindow,

This API exists in Win2000 and Win98, but in WIN95 and WINNT4

In the middle, there is no existence, if you call animateWindow in a regular manner,

So, your program will not be able to run in the early platform. However, you can use DelayLoad to make a load check for AnimateWindow. such,

You don't have to change your code to loadLibrary and getProcaddress

The way.

DELAYLOAD is easy to use. When you decide which DLL you want to use

DelayLoad, just simply add / delayLoad: Dllname. among them,

Dllname is the relevant DLL file name. You also need to increase delayimp.lib to

In the connection library, you also need the original LIB, for example, shell32.lib. All

The part is put on, the connection command is as follows:

Shell32.lib /delayload:Shell32.dll delayimp.lib

Unfortunately, Visual Studio 6.0 IDE does not provide a simple way

Go to realize a DLL DELAYLOAD. So, you have to join:

/ DelayLoad: XXX command line to "Project Settings" -> "Link" ->

"Project Options".

When is it necessary to delayLoad:

When you have a small project, it calls multiple DLLs, just a good delayload.

Candidate example. However, the project may be in the future due to other developers.

Big, it is easy to lose tracking of DLL. I usually use Depends.exe in the SDK.

A DLL that only a few functions to be introduced is a good start.

However, I want to find a simple, automatic approach to track. then,

Out of the DelayLoadprofile program. It is an exe, you can monitor your

The EXE file is called to the DLL until your EXE ends. It printed DLL

Summary of the situation, including how many DLLs are called, how many of each DLL?

The function is introduced.

I am here to emphasize: DelayLoadProfile is just valid for EXE, when it

When you cover all the DLLs associated with your procedure, sometimes a little more

miscellaneous. DELAYLOADPROFILE only gives you which DLL can be switched with DelayLoad

In suggestions, you should use the original processing method when you are uncertain.

DelayLoadprofile: Detailed description

In fact, the principle of delayLoadprofile is very simple: redirect EXE, IAT

The function of the function is a root area. A simple logo in the root zone, the introduced function

Called. Then, jump into the IAT address provided by the original Win32 load. just,

Is it difficult to implement.

First, you must decide, where to run your code, to achieve the IAT of EXE

The entrance changes, specify them to the root zone. These are all over the process.

to make. This avoids your code involves the destination EXE process. This can be used

Traverse all data structures, position, and modify the IAT structure. I am here

Used a lot of readprocessMemory calls.

Then the hard work is to complete in the same process space as the destination EXE. a few

It is very trivial work: traversing all the data structure, build root zone, from targeted IAT

Entrance, then summarize the results when completed. However, in order to complete the process

Work in the EXE process, some delayloadprofile generation

The code must be loaded to the process space of the destination EXE. This is what I have to do.

When confirming that you need to load my code in the destination process, the next question

The question is how to add my code into the destination process. One of the options is that

Require users to connect my DelayLoadProfile library, this will cause users to

A lot of changes to their source code, or Makefile changes, so I don't

It can be used and now requires a fully automated approach.

At this point, I think of the loader, then insert me.

DelayLoadProfiledll go in, a technology is to use createremoteThread,

Create a LoadLibrary thread in the target process. I gave up this, because

CreateremoteThread is not available in Win9x.

A long time ago, MSJ readers may remember a 5 years ago called Apispy32

program. It loads a process and inserts a DLL to record the call of the API. That one

It is a bit like DelayLoadProfile today. However, I am in Win200,

Call that DLL failed. There is a little problem. I think it is time to read it.

That code, and correct that error.

Continue to deepen:

Rehabilitate, DelayLoadProfile contains 2 parts, one, is the process

Load function, it will inject a DLL to your process of address space. then,

That DLL scans all of your EXE IAT, and reordformer creates them to DLL.

Root area. When your program is completed, the injected DLL will scan all roots.

Statistics how many DLLs and functions are called. If you have used Apimon

Related parts, you will recognize similar technical details.

Complete all work, including monitoring the introduction of a program, called

DelayLoadProfiledll. (Look at Figure 1). It uses DLL_PROCESS_ATTACH

And DLL_Process_Detach to initialize 2 main work.

When Dllmain gets the message of DLL_PROCESS_ATTACH,

DelayLoadProfiledll call preordoprofile (), in Preparetoprofile,

Code loads the IAT of the destination EXE, for each reference to the referenced DLL, code

It also detects whether a safe redirect IAT is also detected. Come through the IsModuleokToHook function

Test, in most cases, is safe, so Preparetoprofile

Includes the Redirectiat function.

Redirectiat is a more complex function. If you understand WinIni.h

Introducing related data structures, you will get a lot of help. First, function positioning

IAT and related introduction names, then how many IAT portals are calculated, scan

All IAT look up a null pointer. After getting a number, the program will create a

DLPD_IAT_STUB root, each root corresponding to an IAT entry.

Finally, the code revisits IAT to get the address of each IAT entry, use root zone

A address containing JMP instructions replaces the IAT entry. It also scans the next IAT

DLPD_IAT_STUB Root Area. I will continue to parse it later.

In the root zone of redirect IAT entrance, there are two worth mentioning: 1, IAT is often

Removed by Exe, usually, try to rewrite read-only segments, will cause access violations,

Fortunately, VirtualProtect allows you to change the properties of a destination address. just now

You must change the properties of the IAT to read / write. After completing, the code should restore the IAT segment.

Attributes.

Another place to pay attention to, there is data lead when redirecting IAT

Into the problem. Although the programmer is very few, it is easy to import data with an increased code. VC Running DLL (MSVCRT.DLL) has data export. Such as

Replay a data IAT entry will result in a problem.

So, how to determine an IAT is data? A commercial software should be used

Accurate algorithm to determine the type of an IAT entry. However, I have used it here.

Quick method. It is IsbadwritePtr. If the pointer containing IAT is writable

So, it is likely to be a data pointer. If it is read-only, then

This is a code. Is this test suitable? No, but it is delayLoadprofile

It is enough.

Look at the root area, defined in DelayLoadProfiledll.h

The DLPD_IAT_STUB structure contains code and data. Simply, as follows:

Call DelayLoadProfiledll_UpdateCount

JMP xxxxxxxx // Original IAT address

DWord Count

DWORD PSSNameRordinal

When the EXE calls one of the redirected functions, the control is turned to the root area.

In the command, call DelayLoadProfiledll.cpp

DELAYLOADPROFILL_UPDATECUNT function, continue while the CALL is returned, continue

Call the JMP to jump to the address obtained by IAT. Figure 2 shows the structure

Fig.

Compilation master will determine the DelayLoadProfileDLL_UPDATECUNT function

The address of the count field of the root area is puzzled, and the code will be seen, will

Find delayLoadProfiledll_UpdateCount to find back in the stack

Back address. Returns the address to the JMP XXXXXXXX command. Because Call calls are always

5 bytes, according to these algorithms, you can determine the address of the count field.

There is a problem to remind, it is delayloadprofiledll_updatecount

There is no call to the PUSHAD and POPAD instructions to save / respond to the value of the CPU register. This code

It works fine on many programs, but in some functions, it cannot work properly.

Finally, there is a problem with the __cxxframehandler and _eh_prologs of MSVCRT.DLL,

These two functions expect EAX registers to be set to a value. however,

DELAYLOADPROFILL_UPDATECount changed Eax.

Since this is due to the problem caused by EAX, then I added Pushad and Popad,

Fusing, the problem also exists. After suffering from setbacks, I checked the code generated by the assembly. through

Often, the VC6 compiler will insert code that initials all local variables into 0xcc. This

Some of the code will change EAX before pushad and popad. I have to remove the / gz option.

Result report:

When your process is stopped, the system sends a loaded DLL.

DLL_PROCESS_DETACH message. DelayLoadProfiledll uses this option

The result of the acquisition process is collected. Also, again traverse all roots

unit. Collect all obtained data and output.

Redirect IAT in DELAYLOADPROFILLLLL, it saves EXE's IAT

To a public variable out g_pfirstimportdsc. During the closing process,

ReportProfileResults uses this pointer to traverse the introduction segment. If this IAT is redirected, then the first IAT pointer should point to the first one

DLPD_IAT_STUB root memory allocated for this DLL. Of course, the code keeps the basic

Test method, if some places are incorrect, DelayLoadProfileDLL ignores

Specific DLL.

In general, all all are normal, and, the first IAT entrance refers to my root zone.

unit. For each DLL, the code repeatedly traverses all roots. Each related root area,

The value of its containing field will be added to the total count of the DLL. When traverses is completed,

ReportProfileResults format a string, output the DLL name, and adjust

Total number of times. The code is broadcast using OutputDebugString.

Load and injection:

This program loads your EXE, and the DELAYLOADPROFILL.DLL will call,

(You guessed), it is DelayLoadprofile.exe (source file can be in MSJ website)

Found, http://www.microsoft.com/msj). This code mainly inherits

CDebuginjector class. I will introduce it simple. The function mainly contains the destination EXE

Command line and pass to CDebuginject :: loadProcess. If the process

Successfully created, the function tells CDebuginjector, which DLL will be injected, since

This is the same, DELAYLOADPROFILL.DLL with DELAYLOADPROFILE.EXE,

Will be loaded.

The last step is called before running the target program.

CDebuginjector :: setoutputdebugstringcallback. When DelayLoadProfiledll

When using OutputDebugstring to output the report results, CDebuginjector sees

They, then pass them to your registered callback function. This callback function is just

Use the PrintFS output string to the console. Finally, the function calls CDebuginjector :: run.

In this way, the destination process begins to run, when the time is mature, and the DLL is injected.

Description 3 (Hoodtextfigs.htm # Fig 3) illustrates the CDebuginjector class. this is

Location of the code implementation. CDEBUGTINJECTOR :: LoadProcess creates a destination process,

As a debug process, its branch has been discussed in many documents in MSDN, here,

Don't want to make too many specific discussions.

After the debugging process is running (here is delayloadprofile) enters a loop, constant

Call WaitFordeBugevent and ContinueDebugEvent until the debug is stopped. Each

Waitfordebugevent returns, there are some things that happen on debugging. May be one

Abnormal (including breakpoints), or load a DLL, or create a thread, or other things

Part. Waitfordebugevent documentation calendar contains all possible events.

The CDebuginjector :: run process contains this loop code.

So how do you make your destination as a debugged process, help you inject a DLL?

A debug process can be controlled by the debug process. Every time the debugging has one

A signal event happens, it will be paused, waiting for the debugger call ContinueDeBugevent to continue running. Understand this, a debug process can increase the code to the debugged process,

And temporarily change the register value of the debugger to increase the code to run.

In some specific occasions, CDebuginjector synthesizes a small code root area to call

LoadLibrary. LoadLibrary's DLL name parameter refers to the name of the DLL to be injected.

CDebuginjector writes the root zone (and associated DLL name) to the address of the debugger

space. Then, call SetThreadContext to change the command register of the debugger, run

LoadLibrary root area. All related code in CDebuginjector :: PlaceInjectionsTub

During the process.

Immediately, LoadLibrary is called after the root area is a breakpoint (INT 3). This pause

The runtime of the debugger is running back to the process of debugging. The debugger uses setthreadContext,

Restore the instruction register and other registers to the original value. Another call ContinueDebugEvent,

The debugger continues to run in the DLL injection. No one knows what happened.

If you don't want so much, this injection process will not feel too difficult, but some interested

Things, it is more complex. For example, when is the root area, change the run code, is suitable

When? You can't do this immediately after CreateProcess, because the introduced DLL has not been

Mapped into memory, the Win32 loader has not yet established an EXE IAT. Equipped with: too early.

Finally, I decided to run the debugger until I met the first breakpoint. I am in the program entry

At the place, set your own breakpoint. When the second interruption is triggered, CDebuginjector knows

The DLL of the destination process is initialized (including kernel32.dll). However, in EXE,

There is no code to run yet. It is time to inject DelayLoadProfileDll.dll.

By the way: Where is the breakpoint? By defining, a debugged Win32 process,

Before running, you will call DebugBreak (also INT3), in my early APISPY32 code

In, I chose the initial debugbreak to do injeak. In Win2k, very unfortunate, this

DEBUGBREAK is called before Kernel32.dll initialization, then CDEBuginjector

Setting its breakpoint to the place where Exe is about to get control, then kernel32.dll is initialized

Before, I mentioned a breakpoint that happened after the LoadLibrary call. This is the third

CDebuginject To process breakpoints, all skills to handle different breakpoints, you can refer to

CDebuginject :: HandleException.

Another problem with interesting DLL is to write the LoadLibrary unit there.

After winnt4.0, you can apply for memory for a thread with VirtualaLalkEx. I pick

This method is used. Now, the remaining Win9X is not supported, for this

Question, I use a special feature of the Win9X memory map file, which can be seen in all address spaces. And, is the same address. I simply use the system page file

As a support, a small memory mapping file has been created, and the LoadLibrary root is written in.

This root zone is visible for the debugged procedure. For more details, please see the first part of the article

CDEBuginjector :: getMemoryForloadLibraryStub.

Use DelayLoadProfile:

DELAYLOADPROFILE is a command line program that outputs the result to standard output. In the command line

In prompt, run the delayloadprofile, develop the destination program, and the parameters it need, for example:

DelayLoadProfile Notepad C: /autoexec.bat

Here is Calc.exe for Windows 2000 Release Candidate2, run

DelayLoadProfile results:

[D: / column / col66 / debug] DelayLoadProfile Calc

DelayLoadprofile: Shell32.dll Was Called 0 Times

DelayLoadprofile: MSVCRT.DLL WAS CALLED 9 TIMES

DelayLoadprofile: Advapi32.dll Was Called 0 Times

DelayLoadprofile: GDi32.dll Was Called 60 Times

DelayLoadprofile: User32.dll Was Called 691 Times

I simply start CALC, then shut down immediately. Notice, shell32.dll and advapi32.dll

Nothing called, these 2 DLLs are the original CALC's candidates for DelayLoad.

You will feel surprised, why Calc calls shell32.dll, you did not call it. If you

For CALC, call Dumpbin / Imports or Depends.exe analysis, you will see, CALC

The functions introduced from shell32.dll have only shelLaboutw. Simply, only your choser Calc

The Help | About Calculator menu item will be completely called the shell32.dll into the memory.

This is an example of the most obvious / delayLoad display its value. By the way, shell322.dll

Simple, unconditionally loaded SHLWAPI.DLL and COMCTL32.DLL, and initialize.

If just because DelayLoadProfile reports a DLL is not called, or very few calls,

You can load automatic delay, you have to make a serious determination, which secretly connected DLL, you want

Use / delayload. In this case, if your DLL is to be automatically due to other dependence

Load and initialize, then / delayLoad is meaningless. Depends.exe with platform SDK

It is a very useful tool that can see a DLL usage.

During your test, the number of procedures for your test is also worth considering. If you

Test all the functions of all programs, all of which are introduced. Individual thinks, I

I feel that I should try to narrow the initialization time. This may mean you just start your program.

Then close it. To speed up initialization, load the DLL in turn. Users are subjective by startup

Judging the speed of your program.

I found that several DLLs can benefit from / delayLoad. As described above, shell32.dll is one of them. Another is printing supported Winspool.drv. Since many users are not

Often print, then, it is a good adopter. Also, similar ole32.dll and

Ol3aut32.dll. A polymorphic program, in the small container, used to COM and OLE, then,

The associated DLL is also available. For example, Win2000 CDPlayer.exe and Ole32.dll

Connect, use the CreateStreamonhglobal function. However, in the usual case, I

This function is called.

DELAYLOADPROFILE is not a problem without it, when I am in a lot of programs for IAT, use

DELAYLOADPROFILL is successfully tested, you may also encounter an incorrect operation.

To completely solve this problem, it exceeds the scope of this discussion. However, if you successfully solve

One of the questions, please let me know. I will update DelayLoadProfile in the future day.

I know some of the introduction of MFC42.DLL and MFC42U.DLL and DelayLoadprofile

Conflict, so I use a method, in DelayLoadProfiledll.cpp, there is a

The ismoduleoktohook function, I put MFC42.DLL, MFC42U.DLL and Kernel32.dll

go with. (You can't use / delayLoad and kernel32.dll, because it is no effect)

If a special DLL will have a problem, you should put it in the ismoduleoktohook function.

I hope DelayLaodprofile will help your program use / delayLoad. I should

There will be time to update some professional bonsing, and I also want to hear your successful story.

If you have any suggestions to , please mail to Matt:

Matt@wheaty.net, or http://www.wheaty.net

Abruptly

Microsoft System Journal> February 2000

Athara version document:

In the December 1998 issue of MSJ, Jeffrey Richter and I wrote dueling columns on the DelayLoad feature of the Microsoft® Visual C ® 6.0 linker. The fact that both Jeff and I jumped on this topic is testimony to how cool this feature is. Unfortunately , I still find people who do not know anything about DelayLoad or they think it's some feature that's available only in the latest version of Windows NT®. For starters, let me scream from the highest rooftop that DelayLoad is not an operating system feature. It works on any Win32®-based system. With that off my chest, I'll demonstrate this month's utility, DelayLoadProfile, which makes it almost trivial to determine whether your program can benefit from DelayLoad. As I'll show, even some of Microsoft's Own Programs Can Benefit from It.a Quick Review IFY You're Wondering "What $ Thing Matt's GONE OFF THE DEEP END OVER?" A Quick Recap of DelayLoad is in Order. Here '

s how it works. Normally, when calling an imported function in a DLL, the linker adds information about the imported DLL and function to your executable. Collectively, the information for all the imported functions is known as the imports section. The Win32 loader scans through the imports section at load time and loads each DLL. For each DLL loaded, the loader iterates through all the imported functions and locates their addresses in the imported DLL. These addresses are written back to the imports section in a location known as the Import Address Table (IAT). A simple way to think of an IAT is as an array of function pointers. When calling an imported function, the call uses one of the function pointers from the IAT. How does the picture change with DelayLoad? When you specify DelayLoad for a DLL, the linker does not emit the usual data it would put in the imports section. Instead, it generates a small stub for each DelayLoad imported function. This stub points to the imported DLL and function name. Upon calling an imported function for the first time, the stub calls LoadLibrary to load the DLL. Next, it calls GetProcAddress to get the address of the called function. Finally, the stub overwrites part of itself so that subsequent calls to the Function Go Directly to the Target Code. What I '

ve just described is a slight simplification. In reality, the stub is a small bit of code that calls a routine statically linked into your executable. This routine resides in DELAYIMP.LIB, which must be included in the list of libraries that the linker uses . Also, the stubs and DELAYIMP.LIB code are smart enough to call LoadLibrary only the first time a function in the DLL is used. Subsequent calls to other functions in the same DelayLoad imported DLL do not call LoadLibrary.All things considered, DelayLoad does not add much time or space overhead compared to importing the DLL the usual way. Calling LoadLibrary is only slightly less efficient than letting the Win32 loader load the DLL. Likewise, calling GetProcAddress once for each DelayLoad imported function is only slightly slower than having The Win32 Loader Locate The Imported Functions At Startup. However, The Benefits of DelayLoad Can Easily Make Up for SMALL SPEED PENALTIES. For Starters, IF You Never Call A Function In A DE layLoad imported DLL, the DLL is not loaded in the first place. This comes in handy more often than you may think. Consider the situation in which you have printing code in your program. If the user does not print something during a program session, you've loaded WINSPOOL.DRV for no reason. In this case, using DelayLoad is actually faster since you never loaded and initialized WINSPOOL.DRV. Another benefit of using DelayLoad is that you avoid calling APIs that are not available on one of your target platforms. For instance, say you want to call AnimateWindow, which is supported in Windows® 98 and Windows 2000, but not Windows 95 or Windows NT 4.0. If you were to call AnimateWindow the usual way, your code wouldn '

t load on the earlier platforms. However, with DelayLoad you can make a runtime check of which operating system you're on and only call AnimateWindow if it's supported. There's no need for you to muck up your code with calls to LoadLibrary and GetProcAddress. Using DelayLoad is incredibly easy Once you know which DLLs you want to use DelayLoad with, simply add / DELAYLOAD:.. DLLNAME, where DLLNAME is the name of the DLL you'll also need to add DELAYIMP.LIB to the linker's library list, and you'll still need the original import library, for example, SHELL32.LIB Putting everything together, to DelayLoad against SHELL32.DLL your linker line would need the following:. SHELL32.LIB /DELAYLOAD:SHELL32.DLL DELAYIMP.LIB

Unfortunately, the Visual Studio® 6.0 IDE does not have an easy way for you to specify DelayLoading for DLLs In Visual Studio 6.0, you'll have to add the / DELAYLOAD:. XXX command-line fragment manually to the Project Settings | Link | Project Options edit field When to Use DelayLoad When you have a small project, it's easy to come up with a list of DLLs that are good DelayLoad candidates However, because projects may grow and can involve many developers, it's just as easy to.. lose track of who uses which DLL. In the past, I've relied on gut instinct and Depends.EXE from the Platform SDK. A DLL from which only a few functions are imported is a good place to start.However, I wanted a way to automate and simplify the process. Thus was born the DelayLoadProfile program. DelayLoadProfile is a tool that runs your EXE and monitors the DLLs and functions that your EXE calls. After your program terminates, DelayLoadProfile spits out a summary of which DLLs were used and How Many Calls WERE Made . To each DLL A DLL that's imported, but which had no calls made to it, is a good candidate for DelayLoad importing Let me emphasize one point before continuing:.. DelayLoadProfile works only against your EXE While it could be extended to recurse into all of your imported DLLs and their dependencies, that would significantly complicate its code. As I'll explain later, DelayLoadProfile just gives you hints about which DLLs you might consider using / DELAYLOAD on. you still have to use that neuron-based processing unit between Your Ears To Make Sure It Makes SenseLoadprofile: The Big Picture The Concept Behind DelayLoadProfile IS Simple. Redirecting The Function Pointers in The EXE '

s IAT to point to a stub is all that's needed. The stub simply notes that the imported function has been called, then jumps to the address that the Win32 loader originally stored in the IAT. However, the devil is in the details. First, you must decide where the code will run that locates and modifies the EXE's IAT entries to point to the stubs. Doing the work out-of-process in some sort of control program is one option. This avoids the work involved in getting your code into the target EXE's process. The downside is that it's more work to traverse all the data structures necessary to locate and patch the IAT entries, as well as gather the results later. I'd be swimming in ReadProcessMemory calls. The other approach is to do the hard work in the same process space as the target EXE. This makes it almost trivial to march through the data structures, build stubs, redirect the IAT entries, and summarize the results at the end. However, doing the work in-process requires That Some of the de Layloadprofile Code Be loaded Into the Target EXE '

s process as it runs. This is the path I took. Having committed to running my code in-process with the target, the next problem was figuring out how to get my code into the target process. One choice would have been to ask the user to link with the DelayLoadProfile code. Knowing it would require some effort by the target audience, I discarded this option. If a DelayLoadProfile user needed to modify their source, project, or makefile, many would pass. I needed to make DelayLoadProfile a complete no-brainer. At this point, I had boxed myself into some sort of loader program that would run the target EXE and inject my DelayLoadProfile DLL into it. One technique for DLL injection is to use CreateRemoteThread to start a thread in the target process that calls LoadLibrary on your DLL. I discarded this approach because CreateRemoteThread is not available on Windows 9x, which I wanted to support.Longtime MSJ readers may remember a program I wrote more than five years ago called APISPY32. It loa ds a process and injects a DLL into it for the purposes of logging API calls. That sounds similar to what I needed DelayLoadProfile to do. Alas, when I ran APISPY32 on Windows 2000, it failed to load the DLL. A little digging revealed the source of the problem, and I decided it was time to revamp this code for a whole new generation of programmers. Into the Trenches to review quickly, DelayLoadProfile is a two-part system. A loader process runs your program. Early on in your program , The Loader Process Injects A DLL INTO Your Program's Address Space. this DLL Scans Through Your Exe '

s IAT and redirects the imported functions to point to stubs that the DLL creates. When your program shuts down, the injected DLL scans through the stubs it has created and summarizes how many calls were made to each imported DLL. If you've ever used the APIMON utility from the Platform SDK, you'll recognize the similarities. The DLL that does all the work of monitoring a program's use of imports is called DelayLoadProfileDLL (see Figure 1). DelayLoadProfileDLL uses the DLL_PROCESS_ATTACH and DLL_PROCESS_DETACH notifications sent to its DllMain procedure to initiate the two primary phases of the DLL's work. When its DllMain gets the DLL_PROCESS_ATTACH notification, DelayLoadProfileDLL calls PrepareToProfile. Inside PrepareToProfile, the code locates the target EXE's IAT. For each imported DLL it finds, the code determines if it's a DLL that's safe For Iat Redirection. It'smoduleoktohook Function. Most of The Time, It's Ok To Redirect The IAT, SO Preparetoprofi le invokes the RedirectIAT function. RedirectIAT is where things get dirty, and it really helps if you understand the import-related data structures in WINNT.H. First, the function locates the IAT and the associated Import Names Table. The code then counts how Many Iat Entries There Area by Scanning Through The Iat, Looking For a Null Pointer. with this count, an array of dlpd_iat_stub stubs is created, with one stub for each entry. Finally, IT '

s time for meatball surgery. The code makes yet another pass through the IAT. This time it grabs the address in each IAT entry, stuffs it into a JMP instruction in the stub, and redirects the IAT entry to point to the stub. As the code advances through each subsequent IAT entry, it also advances to the next DLPD_IAT_STUB stub in the allocated array. I'll explain DLPD_IAT_STUB stubs a little later in this column. Two aspects of redirecting the IAT entries to the allocated stubs are worth mentioning. First , the IAT is often placed in a read-only section of the EXE. ordinarily, an attempt to modify such an IAT pointer would result in an access violation. luckily, the VirtualProtect API comes to the rescue and enables you to modify the attributes of a target address, in this case, the IAT. Read-write is the attribute you're looking to modify. When it's finished, the code restores the original memory protection attributes. The other tricky part of redirecting the IAT occurs when you encount er a data import. Although programmers do not frequently do so, it's relatively easy to import data in addition to code. The Visual C runtime library DLL (MSVCRT.DLL) has data exports. Redirecting an IAT entry that refers to data in an imported DLL is almost certainly a recipe for problems. So how do you determine whether an import is a normal code import or a data import? A commercial product could implement a sophisticated algorithm to determine the import type of an IAT entry. However, I took A Shortcut and used isbadwriteptr. if The Iat Points To Memory That's Writeable, It's PROBABLY POINTING TO DATA. LIKEWISE, IF IT POINTS TO Read-Only Memory, Odds Are That '

s pointing to code. Is this a perfect test? No, but it's good enough for DelayLoadProfile's needs. Now let's take a look at the stubs. The DLPD_IAT_STUB structure in DelayLoadProfileDLL.H contains the layout, which is a mixture of code and data. SIMPLIFYING THIS STRUCTURE, A DLPD_IAT_STUB STUB LOOKS LIKE THIS: CALL DELAYLOADPROFILL1_UPDATECUNT

JMP xxxxxxxx // Original IAT Address

DWord Count

DWORD PSZNAMEORORDINAL

When the EXE calls one of the redirected functions, control goes to the CALL instruction in the stub. The DelayLoadProfileDLL_UpdateCount routine in DelayLoadProfileDLL.CPP simply increments the value of the count field of the stub. After that CALL returns, the JMP instruction transfers control to the original address that was stored in the IAT before I bashed it. Figure 2 shows the big picture after the IAT has been redirected to the stubs. Assembler junkies might be wondering how the DelayLoadProfileDLL_UpdateCount function knows where the stub's count field is in memory. A quick look at the code shows that DelayLoadProfileDLL_UpdateCount finds the return address pushed on the stack by the CALL instruction. The return address points to the JMP XXXXXXXX instruction following the call. Since the CALL instruction is always five bytes, some pointer arithmetic yields the stub's starting Address and easy access to the stub's count field. i Had One Problem Using The DelayLoadProfiledll_Upda teCount code that's worth mentioning. Originally, the function did not have the PUSHAD and POPAD instructions to save and restore all of the regular CPU registers. The code worked fine on many programs, but just blew up on others. Finally, I narrowed it down to programs that imported __CxxFrameHandler and _EH_prolog from MSVCRT.DLL. Both of these APIs expect the EAX register to be set to a given value, and DelayLoadProfileDLL_UpdateCount was trashing EAX. Since the trashed EAX was the problem, I added PUSHAD and POPAD. Alas , the problem, I exampled the compiler-generated code, and then smacked my forehead. Normally Whenrating Code for a debug build, The Visual C

6.0 compiler inserts code in the function prolog to set all local variables to the value 0xCC. This code was trashing EAX before my PUSHAD got a chance to execute. To get around this, I had to remove the / GZ option from the debug build settings for DelayLoadProfileDLL. Reporting Results As your process shuts down, the system sends the DLL_ PROCESS_DETACH notification to all loaded DLLs. DelayLoadProfileDLL uses this opportunity to harvest the information collected during the run. In a nutshell, this means scanning through all the stub arrays, counting the number of calls that were made through the stubs, and reporting what it finds. During the setup phase when DelayLoadProfileDLL was redirecting the IATs, it stashed away the address of the EXE's IAT into a global variable (g_pFirstImportDesc). At shutdown time, ReportProfileResults Uses this Pointer to Walk Through The Imports Section Again. for Each Imported DLL, It Retrieves The Address of The Dll's First Iat Entry. if this is an Iat that I've redirected, the first pointer in the IAT should point to the first of the DLPD_IAT_STUB stubs allocated for that DLL. Of course, the code does some sanity checking to ensure that this is the case. If something does not look right , DelayLoadProfileDLL ignores that particular imported DLL. Generally though, everything looks fine, and the first IAT entry points to my stubs. The code then iterates through all the stubs for the DLL. At each stub, the value of the stub '

s count field is added to a running total for the DLL. When the iteration completes, ReportProfileResults formats a string with the name of the DLL and how many calls were made through the stubs. The code uses OutputDebugString to broadcast its findings. Loading and Injection The program that loads your EXE and injects DelayLoadProfileDLL.DLL is called-you guessed it-DelayLoadProfile.EXE (the source code is available from the MSJ Web site at http://www.microsoft.com/msj). This code mainly drives the CDebugInjector class, which I'll describe shortly. function main obtains the target EXE's command line and passes it to CDebugInjector :: LoadProcess. If the process is created successfully, function main tells CDebugInjector which DLL it wants injected. In this case, it's DelayLoadProfileDLL.DLL, which should be located in the same directory as DelayLoadProfile.EXE. The last step before letting the target run wild is to call CDebugInjector :: SetOutputDebugStringCallback. When DelayLoadProfile reports its results via DLL OutputDebugString, CDebugInjector sees them and passes them to the callback you registered. This callback just printfs the strings to the console. Finally, function main calls CDebugInjector :: Run. This call lets the target process begin and, when the time is right, injects the DLL into it. Figure 3 shows The CDebugInjector class. This is where all the good stuff happens. CDebugInjector :: LoadProcess creates the specified process as a debugee process. The ramifications of running as a debugee process have been discussed In MSDN Documentation, SO I Won't go innet The details here. for the purposes of this colorn, IT '

s sufficient to say that the debugger process (in this case, DelayLoadProfile) has to enter a loop that calls WaitForDebugEvent and ContinueDebugEvent until the debugee terminates. Every time WaitForDebugEvent returns, something has happened in the debugee. This might be an exception (including break -points), a DLL load, a thread creation, or other event. The WaitForDebugEvent documentation covers all the events that might occur. The CDebugInjector :: Run method contains the code for this loop. So how does running the target process as a debugee help you inject a DLL? A debugger process has excellent control over the debugee process's execution. Every time a significant event occurs in the debugee, it is suspended until the debugger calls ContinueDebugEvent. Knowing this, a debugger process can add code to the debugee's address Space and Temporarily Change The Debuge's Registers So That The Added Code Executes, CDebuginjector Synthesizes A Small Code Stub That CA lls LoadLibrary. The DLL name parameter to LoadLibrary points to the name of the DLL to inject. CDebugInjector writes the stub (and the associated DLL name) to the debugee's address space. It then calls SetThreadContext to change the debugee '

s instruction pointer (EIP) to execute the LoadLibrary stub. All of this dirty work occurs within the CDebugInjector :: PlaceInjectionStub method. Immediately following the LoadLibrary call in the stub is a breakpoint instruction (INT 3). This stops the debugee and gives control back to the debugger process. The debugger then uses SetThreadContext again to restore the instruction pointer and other registers to their original values. Another call to ContinueDebugEvent and the debugee is on its way with the DLL injected, none the wiser that anything has happened. If you do not think too hard, this injection process does not sound too messy. Nonetheless, a few interesting problems crop up that complicate things. For example, when is the proper time to create the stub code and redirect control to it? you CAN't do this immediately after the createprocess Call Because, Among Other Reasons, The Imported Dlls Haven't Been Mapped Into Memory At this point and the estil been fix he Win32 loader. In other words, it's too early. The solution I ultimately decided on was to let the debugee run until it encounters its first breakpoint. Then I set a breakpoint of my own at the entry point of the EXE. When this second breakpoint triggers, CDebugInjector knows that DLLs in the target process (including KERNEL32.DLL) have initialized, but no code in the EXE has run. This is the perfect time for injecting DelayLoadProfileDLL.DLL. incidentally, where does the first breakpoint come from? By Definition, A Win32 Process That '

s being debugged calls DebugBreak (also known as INT 3) very early in its execution. In my ancient APISPY32 code, I used the initial DebugBreak as the occasion to do the injection. Unfortunately in Windows 2000, this DebugBreak occurs before KERNEL32.DLL is initialized. Thus, CDebugInjector sets its own breakpoint to go off when the EXE is about to get control, and thus knows that KERNEL32.DLL has been initialized. Earlier, I mentioned a breakpoint that occurs after the LoadLibrary call returns. This is a third breakpoint for CDebugInjector to handle. All of the mechanics for handling the different breakpoints can be seen in CDebugInjector :: HandleException. Another interesting problem to address with DLL injection is where to write the LoadLibrary stub. Under Windows NT 4.0 and later you can allocate space In Another Process with Virtualallocex, SO i TOOK THAT ROUTE. That Leaves Out Windows 9X, Which Doesn't Support Virtualalalk. For this Scenario, I TOOK Advantage of a Unique Proper ty of Windows 9x memory-mapped files. These files are visible in all address spaces, and at the same address. I simply create a small memory-mapped file using the system page file as backing, and blast the LoadLibrary stub into it. The stub is implicitly accessible in the debugee process. for the details, see the code listing for CDebugInjector :: GetMemoryForLoadLibraryStub at the link at the top of this article. Using DelayLoadProfile DelayLoadProfile is a command-line program that writes its results to standard output. From A CommAnd Prompt, Run DelayLoadprofile, Specifying The Target Program and Any Arguments It Needs, Such As: DelayLoadProfile Notepad C: /autoexec.bat

Here Are The Results of Running DelayLoadprofile Against Calc.exe from Windows 2000 Release Candidate 2: [D: / Column / Col66 / Debug] DelayLoadProfile Calc

DelayLoadprofile: Shell32.dll Was Called 0 Times

DelayLoadprofile: MSVCRT.DLL WAS CALLED 9 TIMES

DelayLoadprofile: Advapi32.dll Was Called 0 Times

DelayLoadprofile: GDi32.dll Was Called 60 Times

DelayLoadprofile: User32.dll Was Called 691 Times

I simply started CALC and immediately shut it down. Note that SHELL32.DLL and ADVAPI32.DLL both had no calls to them. These two DLLs are prime candidates for CALC to DelayLoad. You may be wondering why CALC loads SHELL32.DLL, yet doesn 't call it. It would be easy enough to run DumpBin / IMPORTS or Depends.EXE against CALC. In doing so, you'd see that the only function CALC imports from SHELL32.DLL is ShellAboutW. Simply put, unless you select the Help | About Calculator menu item in CALC, it's a complete waste of time and memory to load SHELL32.DLL This is a fabulous example of where / DELAYLOAD can really show its worth incidentally, SHELL32.DLL implicitly links against SHLWAPI.DLL and.. COMCTL32.DLL-two additional DLLs that are brought into memory and initialized for no reason. Just because DelayLoadProfile reports that a DLL is receiving few or no calls at all does not mean you should automatically DelayLoad it. Be sure to consider whether one of Your iMPlicitly Linked Dlls Also Links against the DLL you're considering using DelayLoad with. If this is the case, it's not worth using / DELAYLOAD in your EXE since the DLL is still going to be loaded and initialized because of some other dependency. Depends.EXE from the Platform SDK Is A Great Tool for Quickly Determining The Scope of A DLL '$ Usage. Another Thing to Consider by DELAYLOADPROFILE IS How Much of Your App You'

ll exercise during your test. Obviously, if you exercise all aspects of your app, all the DLLs you import in the EXE will be invoked. Personally, I think minimal load time is a good target to shoot for. This might mean just starting your program and then closing it down. by spreading the work of loading and initializing your DLLs throughout your application as it runs, you can speed the initial load sequence. Users often subjectively judge the speed of your application by its startup time. I've found a few DLLs that will benefit from using / DELAYLOAD. As you saw earlier, SHELL32.DLL is one of them. Another is WINSPOOL.DRV, which is used for printing support. Since most users do not print frequently, it's a good candidate , as are OLE32.DLL and OLEAUT32.DLL. in addition, a variety of programs use COM and OLE in some minimal capacity, making those DLLs possible candidates, too. For example, the Windows 2000 CDPLAYER.EXE links against OLE32.DLL and The createstreamonhglobal api. Yet in Ordi nary usage, I did not observe this function being called. DelayLoadProfile is not without its faults (literally). While I've tested it successfully with a large number of applications, you may still run into the occasional program that does not work so well when DelayLoadProfileDLL interfaces with its IAT. Trying to find and locate all these odd scenarios is beyond the scope of this column. However, if you locate and fix one of these problems, please let me know. I may update DelayLoadProfile at some future Date. I know, programs That Import mfc42.dll and mfc42u.dll can Crash with delayloadprofile. For That Reason I've provided an escape hatch. in delayloadprofiledll.cpp it '

S the ismoduleoktohook function. I've placed mfc42.dll, mfc42u.dll, and kernel32.dll in it. (You can't use / delayload with kernel32.dll anyhow, so it's no loss.) IF a Particular DLL Seems To be giving you problems, first try adding it to IsModuleOKToHook. I hope DelayLoadProfile's ease of use will inspire you to tune your applications to make use of / DELAYLOAD. I certainly had a good time updating some classic code, and I'd enjoy hearing your Success Stories, TOO.HAVE A SUGGESTION for Under the hood? send it to matt at matt@wheaty.net or http://www.wheaty.net.

转载请注明原文地址:https://www.9cbs.com/read-2854.html

9cbs

New Post(0)