Analysis of formatted string loopholes
Creation time: 2001-03-03
Article properties: reprint
Source: Author: isno (isno@sina.com)
Article submission:
Xundi (xundi_at_xfocus.org)
Analysis of formatted string loopholes
Author: isno (isno@sina.com)
-----------------table of Contents-------------------
I. Introduction
2. Introduction to Basic Knowledge
3. Formatting string vulnerability principle
(1) The number of parameters is not fixed to access the proceedive data
(2) Write the jump address using the% N format
(3) Control the value of the jump address using the additional format.
(4) Summary
4. Analysis of WU-FTP 6.0 formatted string vulnerability
(1) Where is the problem?
(2) WU-FTP vulnerability
5. Postscript
----------------------------------------
I. Introduction
Recently, many software have been found to format string vulnerabilities, the most famous is Wu-ftp6.0 and rpc.statd, due to
A quite a number of websites have been installed in the default, and there are many online attacks on these two vulnerabilities.
The website that is broken by these two vulnerabilities is very much. Therefore, it is necessary to seriously study the formatted string vulnerability, but
The Chinese article in the online introduction of the formatted string vulnerability is particularly small, and I know only one Warning3 written and the other.
Xuzq translated article, I refer to a few English articles, I spent the headache for half-day workers to see this vulnerability.
mechanism.
Since the article written by the article, the ordinary beginners like me look very hard, I think I understand my understanding.
Write a little bit of articles, so that other rookies like me are driven by the headache, and this article is also a memo.
Materials, wait for me to forget to go back and see :-) Because I have limited level, it is inevitable, you are welcome.
Education.
2. Introduction to Basic Knowledge
It is necessary to review the basics about the stack before understanding the formatting string vulnerability, online introduction of buffer overflow
There are a lot of articles, most of which have introduced the knowledge of the stack, readers can refer to those articles themselves, I am just simple here.
Introduction.
The dynamic data of a program is stored through a region called a stack. The stack is in memory high-end, it has a special
Sex: Backward first out. When the subfunction is called in the program, the computer first presses the parameters into the stack and stores the instruction.
The content in the device (EIP) is used as a return address (RET) pressing the stack, and the third press-in stack is the base register (EBP), then
Copy the current stack top pointer (ESP) to EBP, as a new base address. Finally, subtract the ESP to a certain value, used for this
The ground variable leaves a certain space.
Ordinary buffer overflows use the characteristics of the stack growth direction and the data storage direction, with the number of stored
According to data over the previous stack, it is generally overridden to return the address, thereby changing the process of the program, so that the function returns to the time.
If the address specified by the hacker, you can do anything in accordance with hackers.
Formatted string vulnerabilities and ordinary buffer have similar things, but they are different, they all use programmers.
Negotten to change the normal flow of the program run. The principle of formatting string vulnerabilities is described in detail below, and finally WU-FTP6.0
Formatting string vulnerability for analysis.
3. Formatting string vulnerability principle
The so-called formatting string is that the data is output in a certain format in the * printf () series function, which can be output.
Go to the standard output, that is, Printf (), can also be output to the file handle, string, etc., the corresponding function has Fprintf, Sprintf,
Snprintf, vprintf, vfprintf, vsprintf, vsnprintf, etc. The place that can be used by hackers is also in this series of * printf () functions, some people may ask: These functions are just outputted data, how can they cause security hidden dangers?
Under normal circumstances, there is of course not caused any problems, but * Printf () series functions have three special nature, these special
The nature will form a vulnerability if it is used by hackers.
(Note: The following test environment is Redhat Linux 6.0)
# You can be used by hackers three features of the * Printf () series function:
(1) The number of parameters is not fixed to access the proceedive data
First, the first character can be utilized is: * The number of parameters of the Printf () series function is not fixed. Take Printf ()
Functions For example, if we want to output 3 integer data and 1 string in turn, you can use the following procedures:
#include
Int main (void)
{
INT i = 1, J = 2, k = 3;
CHAR BUF [] = "TEST";
Printf ("% s% D% D% D / N", BUF, I, J, K);
Return 0;
}
This is normal usual use, the program will output:
TEST 1 2 3
This printf () function has 5 parameters, the first is the formatting "% S% D% D% D / N", the second is the string BUF
Address,% s corresponds to BUF, and three% D will correspond to I, J, K, respectively, which outputs the data. But if we reduce Printf ()
The number of parameters of the function is written like this:
Printf ("% S% D% D% D / N", BUF, I, J);
The formatted output symbol is still 4, but the corresponding data is only 3 (BUF, I, J), how is the situation?
We compile to see, this program output:
Test 1 2 1953719668
We can clearly see that although there is no corresponding data to the last% D, it still outputs a 10-bit
Integer 1953719668, what is this big integer? We will modify the source program and change the statement of the output to:
Printf ("% S% D% D% x / n", BUF, I, J);
That is, according to the 16-en-output output, the result of the output is:
Test 1 2 74736574
That is, when the formatter of the Printf () function is not provided, the printf () does not report an error, and
It is printed to print some 4-byte content in memory, and the content of these four bytes is 74736574.
So what is the 34736574? If you are familiar with the ASCII code, you should be able to reflect, the string is in memory.
They are stored in the form of ASCII codes, they have the following correspondence:
Hexadecimal decimal character
74 ----------> 116 ---------> T
73 ----------> 115 ---------> s
65 ----------> 101 ---------> E
74736574 The corresponding string is just Tset, because the string is arranged in the middle of the memory, 74736574 corresponds to
The actual string should be: test. Does it look a bit more familiar? Turning back to the procedure, right, is our program
The contents of the string buf [] defined in. This is something that it is not accidental, recalling the workflow of the stack you have said, we can imagine
To this program, the situation in the stack: i) call the return address before the main () function;
II) then press EBP and copy the ESP to EBP;
III) minus ESP minus the number, which is to expand the stack, leave a space for variables I, J, K, BUF;
IV) Start calling Printf (), putting 4 parameters J, I, BUF, and format string "% S% D% D% X / N" in sequence in sequence in stack;
v) Bush the return address of Printf ();
VI) Press EBP at this time;
VII) Start executing Printf ().
At this time, the stack looks like this:
Stack top
-------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------
| EBP | EIP | Format | BUF Address | i | J | BUF Content | / 0 | K | J | I | EBP | EIP |
-------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------
Seeing the actual content of the stack, it is not difficult to understand why 74736574 "Test" is printed, printf () first finds the first
A parameter format string "% s% D% D% X / N", then start printing according to the corresponding relationship, printing the content in the front stack,% s
The BUF address is printed, and the content of BUF [] is printed. The first% D corresponds to I, the second% D corresponds to J,% x, should correspond to K, but
Since we are provided to the parameter of Printf (), there is no K, but the front is just the BUF content, so the content of the BUF is used as 16.
The number output is, that is, we see 74736574. It can be predicted that if you are more than a few% x in the format string provided to the Printf (),
Printf () also continues to print "/ 0" (BUF end compliance), K, J, I, EBP, EIP, etc. in front stack.
Speaking here, the root cause of the formatted string vulnerability has been revealed: because the number of parameters of the * printf () series function
Is not fixed, if the first parameter is the format string is provided by the user, then the user can access the format string.
Anything in the stack.
The reason why the formatting string vulnerability is because the programmer gives the first parameter of Printf (), gives the user.
Provided, if the user provides a specific number of% x (or% D,% F, with you), you can access the stack content of a specific address.
Some people will say: "By! You have spent so half of the day, just to print the content in the front of the stack?" We are of course not
Just to look at the contents of the stack, we have to change the contents of the stack, change the return address, so that the program jumps to perform us
Code, this requires the second special nature of the * Printf () series function.
(2) Write the jump address using the% N format
So far we only show the content of memory without changing it, but using * printf () a special format
% N, we write content to memory.
% N is a formatter that is not often used in programming, its role is to write a memory address in front of the previously printed length,
In order to make a specific usage and nature, let's take a look at the routine below:
#include
Int main (void)
{
Int Num;
INT i = 1, J = 2, k = 3;
Printf ("% D% D% D% N / N", I, J, K, & Num); Printf ("% D / N", NUM);
Return 0;
}
Run display:
123
3
It can be seen that the role of% N is to save the number of characters that have been printed to the corresponding memory address, here is Num.
Note that this must be corresponding to a memory address,% N writes the number of characters to the memory of this address. If the above statement is changed:
Printf ("% D% D% D% N / N", I, J, K, Num);
This will appear paragraph access errors. Although this is important, because when actually uses a vulnerability, it is not directly jump directly.
The address write function returns the address unit, but writes to the address of the return address stored in the function, that is, the RETLOC often said.
The address of the return address of this storage function is usually in front of the string we provide, so it may be a bit around, and the other is said, it is
Say that we don't directly override the return address, but through the address to rewrite the address, this is often confused, if you have not
If you understand, you can carefully understand the usage of the pointer in the C language, and there is a similar place.
Ok, so far we have known that you can use the submission format string to access the content in the front stack in the format, and use
% N can write a value to the address in a memory cell, since we can access the strings we submitted, you can mention
The complex string is placed on the address of the return address of a function, so that the% N can be used to override this return address.
However, the value written in% N is not casual, it can only write the number of characters printed in front, and we need
Yes, write us to store the address of the shellcode, just like normal overflow. This problem is really trouble, someone may think:
The% d of the value of the value of the jump address is not in front of% N? This is theoretically viable, but actually does not work.
Because the stack is in the high end of the memory, the memory address inside the stack is also a considerable number, if we use one% D to correspond to 4 bytes content
That is, the first amount is too much is a problem, and the memory unit every 4 bytes is printed as an integer, its
The actual length is not determined, some may print a '1', some may print 5 '45367', this is what we can't
It is expected.
At this time, you need to use the third "good" nature of the * Printf () series function.
(3) Control the value of the jump address using the additional format.
* The Printf () series function has a nature that the programmer can define the width of the print character. People who have learned C language must know this,
Just plus an integer in the middle of the format, * Printf () will use this value as the output width, if the actual actual is greater than the designated wide
The degree still outputs according to the actual width, and if less than the specified width, press the specified width. For example, we can use the following statement to 100 characters wide
Degree Output Integer i:
Printf ("% 100d", i);
The form of printf ("%. 100F", i) is output in a decimal output I. PRINTF ("%. F", i) is not output with 1 decimal output, but
Output I output with a total of 8 bits of decimal number. If i is equal to 1, the output should be 1.000000. Because this "% .f" can advance 8 digits forward,
It is used to quickly reach the return address when it is often placed in the actual attack.
We can use this feature to use few formats to output a large value to% N, and this value can be specified by us. What we have to do is to make some calculations, convert the address to return into an integer, put it in the format string in front of% N. For example, we want
Put 200 in NUM, you can use the following statement:
Printf ("%. 200D% N", I, & num);
When this statement is executed, the value of Num becomes 200. If you use the value of the jump address for 200, then% n can write the jump address.
Num.
(4) Summary
Ok, all the theoretical basis of formatted string vulnerabilities has been introduced here. Let's take back:
First, if the format string parameter in the program in the program is provided by the user, we can submit a string of it.
% D (or% F,% U et al) to access any memory unit in front of the format string in the stack.
In the back of the format string we submit, add a% N format, we can write to a memory cell in front of the stack.
The number of characters already printed. When the actual attack is usually, the address of the return address of a function is placed in front of the submitted format string, then
Let% N suit correspond to this address value, which is written to the value, and store the function to return the address.
We control the values written to the function return address by adding formats, generally controlling the last formatter in front of% N.
This value, this usually requires some calculations, the calculation method is generally subtracted from all formats before the last format string with the address of the shellcode.
Print length. This is the address of the shellcode in this way.
In theory, it is only ideal, and there will be more problems when actually attacking a program. The first thing to solve is that% N is just
The corresponding storage function returns the address of the address, otherwise the value of the return address cannot be changed. This is some reusable startup and can return formatted strings.
The program is easier to solve, we can do not put into the jump address, but fill in a few special characters before submitting the format string, such as "ABCD",
Then use the% x format string to display this special string with the% x, which we have to do is continuously increasing the formatting in the format string.
Number, until the value returned by the program is just the ASCII code of the special characters we submitted. In this way, we know that the way to store functions returns the address.
Where is the location, then we will return the address of the real storage function to the address of the format, with% N as the end of the format, this
You can return the correct write function of the jump address to the address.
Of course, this method needs to be attacked, can be constantly started and can return the print content of the submitted format, Wu-ftp6.0 is
Such procedures, so the attacks of the format string vulnerability for WU-FTP6.0 are easier to succeed. And some procedures are not, such as cfngnene, this
A program has been found that there is a format string vulnerability for a long time, but there is no successful attack program, mainly because of one but to cfngne
Send a format string it is changing, so you can't guess multiple times. An attack must be guessed in this program must guess the address of the address of the storage and
Jump addresses, so it is often not easy in real-world attacks.
Another problem that is difficult to solve is that the exact function returns to the address to fill in the format string, which% n can write the jump address into
The correct position is. The return address of this function is generally tested when writing an attack program, which can be successful on the test machine, but
The value of this return address on different machines is different depending on the environment variable and the compilation option, such as WU-FTP and slaves installed with RPM.
The code compiles the WU-FTP, and their return addresses are often different, and they need to be adjusted according to the actual situation of the attacked host. This is only root
According to experience, or simply use violence to guess the return address, but according to actual testing, the probability of success is not very good.
There is also the problem of jump address, that is, the address of shellcode. Because we have to rewrite the return address of the function, make it jump to execute shellcode.
So you must know the address of the shellcode. This problem is relatively easy to solve, we can do it in front of Shellcode as ordinary overflow
Fill in a string NOP so you only need to know that a probably the address range is OK, just jump to the NOP range, you can execute shellcode
. So the jump address in the general attack program often does not need to be adjusted because as long as it is in a probably address range.
Ok, theory discusses so much. Let's take a look at the formatting of WU-FTP6.0 formatted vulnerabilities and methods.
4. Analysis of WU-FTP 6.0 formatted string vulnerability
(1) Where is the problem?
WU-FTP (Washington University FTP Server is a very popular UNIX / Linux system FTP server, its 6.0 version
There is a formatted string vulnerability. Since it is installed by the default in most Linux systems, quite a number of websites are affected by this vulnerability, for it
The attack is also very common.
Let's take a look at the source code of WU-FTP, where is it possible to be hacked?
The "Site EXEC" command submitted by the user is handled by a function called Void Site_exec (Char * CMD), where cmd is user
Submitted command. There is such a statement in this function:
-------------- ftpcmd.y files 1929 line ----------------------
LREPLY (200, CMD);
----------------------- Cut here ------------------------- -
The site_exec () function handed over the command submitted to the LREPLY () function, let's take a look at the definition of the Lreme () function:
-------------- ftpd.c files 5343 line ------------------------
Void Lreply (int N, char * fmt, ...)
{
VA_LOCAL_DECL
IF (! dolreplies)
Return;
VA_START (FMT);
/ * Send the reply * /
Vreply (use_reply_long, n, fmt, AP);
VA_END;
}
----------------------- Cut here ------------------------- -
Obviously, the second parameter char * FMT of Lreme () should be a format string, but in the previous call, it will give it to the user command.
Provided, this is where problems are caused. Follow the Lreme () to handle the FMT to the Vreply () function, let's take a look at Vreply ()
Definition:
-------------- ftpd.c files 5275 line ------------------------
Void Vreply (Long Flags, Int N, Char * FMT, VA_LIST AP)
{
Char buf [bufsiz];
Flags & = USE_REPLY_NOTFMT | USE_REPLY_LONG;
IF (n)
Sprintf (buf, "% 03d% c", n, flags & use_reply_long? '-': '');
/ * This is somewhat of a kludge for autospout. I Personally Think That
* AutoSpout SHOULD BE DONE DIFFERENTLY, But That's Not my department. -kev * /
IF (Flags & Use_Reply_notfmt)
Snprintf (buf (n? 4: 0), n? sizeof (buf) - 4: sizeof (buf), "% s", fmt);
Else
VSnPrintf (buf (n? 4: 0), n? sizeof (buf) - 4: sizeof (buf), fmt, AP);
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
!!! Pay attention to this sentence!!!
IF (debug)
Syslog (log_debug, "<---% s", buf);
/ * YES, you want the debugging output before the client output; wrapping
* Stuff Goes Here, you see, and you want to log the cleartext and send
* The Wrapped TEXT to The Client.
* /
Printf ("% S / R / N", BUF); / * and send it to the client * /
#ifdef transfer_count
BYTE_COUNT_TOTAL = Strlen (BUF);
BYTE_COUNT_OUT = Strlen (BUF);
#ENDIF
Fflush (stdout);
}
----------------------- Cut here ------------------------- ----------------------
Since the first parameter (ie, Flags) submitted to VREPLY () is use_reply_long, so after & = operation
After the Flags is still use_reply_long. This is 0 of this (Flags & Use_Reply_Notfmt). Therefore
The judgment statement will enter the ELSE execution:
IF (Flags & Use_Reply_notfmt)
Snprintf (buf (n? 4: 0), n? sizeof (buf) - 4: sizeof (buf), "% s", fmt);
Else
VSnPrintf (buf (n? 4: 0), n? sizeof (buf) - 4: sizeof (buf), fmt, AP);
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
!!! Pay attention to this sentence!!!
Note that the VSnPrintf () function that is not executed, it puts the FMT in the parameter position of the format string. And this FMT
It is the command CMD submitted by the user. Recall the principle of formatting string vulnerability in the previous previous: If * Printf () series
The format string parameter of the function is submitted by us, then we can use a string format to access the format string.
The content in the stack is, if we can access the content submitted, you can store some of this in front of this content.
The function returns the address of the address, then you can use% N to override this return address, so that it jumps to perform us.
SHELLCODE.
(2) WU-FTP vulnerability
So we want to use this vulnerability, you need to log in to the FTP server first (programming, don't go to the land),
You can log in with anonymous user (FTP, Anonymous), then submit a Site Exec command as follows: Site | EXE | C AA | RETLOC |% .f ...%. F |%. (RET) D |% N
Note: The '|' symbol is to make the reader to see the separator plus the string structure, and there is no '|'.
This is the Site Exec command in front of the format string submitted, and the "AA" "" AA "is to make Retloc in 4 bytes.
Aligned for units, we usually say Align.
Then follow the RETLOC is the address of the storage function we have to write to return the address, and we have to use% n to correspond
It can return the address of the jump address to the address. Generally, the function we have to rewrite should be the most recent function, here
You can rewrite the return address of Vreply () so that it jumps to perform our shellcode when it returns.
Put a string of% .f behind Retloc, saying that the% .f is displayed once, where it is displayed in us.
After the command string, press the local variable of the stack, so that the% N can just correspond to Retloc.
Then follow the%. (RET) D The role of D is to adjust the number of characters in printing to the value of the shellcode address, make% N
It is possible to write the number of characters printed, that is, the address of the shellcode exactly in the return address of the Vreply () function. Attention, here
The RET is not the address of the shellcode, but should be [shellcode address - (%. F number * 8) -16]. 16 is the front
"Site | EXE | C AA | RETLOC |" characters. This should be the address of shellcode before the total number of programs prior to% N should be the address of shellcode
.
The final% N role is of course to write the jump address (shellcode address) to the return address of Vreply (), make it back
Time to jump to execute shellcode. As mentioned in the formatting of the vulnerability principle:% N is not directly corresponding to returning
The address is modified, but corresponds to the address of the returned address, indirectly modified the return address. So although Vreply () returns
The address is compared with the format string, but we can still rewrite its content.
There is still important to forget it. When is shellcode submitted? We can put shellcode into the user's secret
The code is submitted to Wu-ftpd. Since the password of anonymous users can be specified, do not affect our use of anonymous
Lujin servers. At this point shellcode should be stored as global variables in the HEAP / BSS segment of the program, not in the stack.
Debugging on the local machine You can know that the approximate address of Shellcode is 0x80756xx, which is the value on the redhat 6.0, in its
It will vary in its system. Since we can put a bunch of NOP before shellcode, there is no need to know shellcode
Accurate location, just let the program jump to the NOP range.
It should be noted that the shellcode here must be with breakthrough chroot () function, because if you log in with anonymous user
If you can only access the catalog protected by Chroot (), that is, an anonymous user logs in the directory, which is not bound / bin / sh. So
In the shellcode first chroot () to the root directory. There are many very good chroot shellcode on the Internet, which can be used directly.
Have.
The above is the method of attacking the WU-FTP6.0 formatted string vulnerability. Now there is a lot of WU-FTP6.0 attacks.
In order, I have been looking for one of them to explain, but I explain it after a few words, I found too TMD. And there is no need
To explain the attack program, because I have explained the steps of attacking this vulnerability.
Here I only introduce the more than a better attack program I have ever used to everyone: (1) Attacks to install WUFTP6.0 RedHat 6.0, 6.1, 6.2 with RPM:
http://go6.163.com/~antiroot/exploit/wu-lnx.c
(2) Attacks Installing Wuftp6.0 FreeBSD and SUSE 6.3, 6.4 Compare programs:
http://go6.163.com/~antiroot/exploit/wuftpd-god.c
(3) Attacks Installing WUFTP6.0 Solaris 2.x is more effective:
http://go6.163.com/~antiroot/exploit/ftpd.c
According to my experience, install the RedHat 6.0, 6.1, 6.2 of RPM is the easiest attack, may be
Because most attack programs are preparatory in this system environment. WUFTP, which is compiled with source code, is not easy to attack success.
Need an attacker to adjust some of the parameters in the attack program, mainly modifying Retloc, that is, the address of the storage function returns the address, there is
It is necessary to repeatedly adjust this value to attack success.
5. Postscript
The purpose of this paper is to explain the formatted string vulnerability as much as possible, so I didn't use GDB when explaining the principle.
In terms of the commissioning results of the tool, but try to speak only the principle itself, so it may be easier for some readers who are not familiar with the debugger.
understanding. Maybe someone still thinks that this article is not very easy to understand, because the formatting string vulnerability itself is a more complicated thing, at least
Readers need to understand some basic knowledge about C language and stack overflow, so that this article is helpful.
Maybe some people will say: "Why don't you figure out the technical details of this vulnerability? I don't understand the details directly with the attack program attack.
It can also be successful. "Yes, when we actually attack, we generally don't think about the details, but but but, but, your goal is
What is a HACKER? Is it a scriptkid? If your goal is the former, there is no doubt that you need to understand the technical details. in the case of
The latter, you also need to understand the technical details !! Because you know the details, the bigger the success rate of your attack is more.
Thanks to Warning3 et al., The article they wrote makes me benefit.
references:
<< * Printf () formatted string security vulnerability analysis >> --- Warning3
<< Format string attack >> --- Tim newsham (xuzq translation)
<< Format Bugs: What Are The, WHERE DID THEY COME FROM, ... How To Exploit Them >> --- Lamagra
Welcome to my homepage:
http://isno.yeah.net
(* Please keep your article intact *)