Technical core:
UTF-8 causes the same code's multiple sense of the same code because the defects of the encoding method can cause IDS and OS to misunderstand the code.
Recently on the assessment of Ids in UNICODE, Ids, IDS, Id. This paper tries to explain what unicode, the mechanism of the Unicode capture IDS and the corresponding remedies. The core of the article is mainly circulated around UTF-8. Unicode defects threatenly and complicated IDS, this is also what I need to explain to readers.
Invasion IDS
One of the important conditions to measure whether the thief has to do is to see if he makes the alarm system a furnish, and the invasion IDS is also the case. Once the intruder knows which behavior will respond to what the IDS will respond, it will take appropriate measures. Attacks can be expanded according to all levels of the OSI model, even if NIDS (network-based IDS: NetWork-Based IDs) build trees in the defense of low-level attacks, the fragment attack can make it hardened.
For NIDS, an application-based attack is a complicated issue. Since NIDS must fully imitate the interpretation of the application layer protocol, the attacker can use the difference between the application layer and the IDS as the starting point of the attack. Use the signature IDS will find that you can completely unable to deal with complex interactions. Since the protocols supporting similar Unicode support are increasingly complex, IDS is increasing in the application layer.
What is unicode?
Unicode makes any language characters can be easier to accept machines, and Unicode is managed by the UC (Unicode Association) and accepted its technical modifications. It is required to receive UNICODE in technology standards such as Java, LDAP, and XML. Unicode's characters are coded, and add xxxx back with u, where x is 16-based characters.
What is UTF-8
In the D36 section of the Unicode 3.0.1 errata released by the UC, UTF8 refers to the Unicode's conversion format. In this format, Unicode's code points are composed of 4 bytes, and UTF-8 provides a technology possible. It can be a unicode encoding method, but can also represent the most commonly used ASCII compatible with the text on the Internet.
During the implementation of the compatible process, UTF interprets the standard 7-bit ASCII code (U 0000 to U 007F) as a character, from U 0080 to U 07FF as two characters, U 0800 to U FFFF as three characters, Then, as 4 characters, the design concept of this algorithm is that the code can be directly converted without having to perform a table index.
Both the IE and Office2k of MS support UTF-8-based URLs, IIS set UTF-8 as a 3-byte process under the default configuration, and Apache can support UTF-8 after configuration.
UTF-8 and Unicode security issues
Bruce Schneier is first discussed on July 15, 2000, which is aware of this security issue. He is the first tool uncode's encoding method, and there may be a case where the same character represents multiple meaning, or there may be new code points. Modify the situation of the front code point.
In the security document, the meaning of characters must be confirmed, but due to the complexity of Unicode, the possibility of multiple representations increases, and the security defects are also generated.
UC has recognized the multi-representation of Unicode and has modified the Unicode standard to eliminate this situation, there is related instructions in the UTF-8 errata, because these modifications are just the most recent things, the relevant application levels It's too late to respond, IIS is a typical.
The older UTF-8 conversion format is that when the conversion task adds a byte, the entire code point will re-account once, that is, when UTF-8 starts to start a 2-byte conversion, it will Re-conversion on the basis of single-byte conversion, when the conversion byte is three bytes, the conversion will be restarted from the single-byte and 2 bytes. A typical is "/", u 005c, In the old version of the UTF-8, / can be described as 5c, C19C and E0819C. All of these code is the same Unicode code point: When the request is processed according to the algorithm, get the same value, a little older support The requests for UTF-8 accept these three values and determine it to /.
Application brings big problem
Unicode will bring another big problem, that is, application requests and OS may make different code points to understand the meaning.
After testing the IIS on Windows2k's Advance SVR (E-text), I found that IIS is a good example of elaborating this problem. For example, the letter A can be set forth in Unicode: U 0100, U 0102, U 0104, U 01DE, U 8721, it should be noted that these code itself has multiple Meaning. Due to IIS itself, this may result in nearly 30 species of the character A, and e has 34 species, I have 36 species, and there is 58 kinds of U, a string " Aeiou "may have 83,060,640 expression methods!
May cause problems
A good example of this issue is the EXTEND UNICODE DIRECTORY TRAVERSAL flaw of MS. 8 Decoding the TRAVERSAL that detects the directory before, then miss the directory TRAVERSAL using UTF-8, and the OS begins to decode.
The attack on this vulnerability will be easy, if the invader input is similar
http://victim/../../winnt/system32/cmd.exe "This URL, IIS will generate an error on" ../ .. ", if the intruder uses UTF-8." ./ .. "Encoding" ..% C1% 9c .. "and sends to IIS, directory Traversal will not be processed, if the default license is not modified, the intruder can start running other files.
The impact of corresponding issues on IDS
Generally speaking, when the vulnerability is disclosed, the IDS manufacturers will hurry to launch some measures (of course, there are some basic works), only the talented NetWorkice is really correctly handled (NND, do Advertising! ??)
Now let's look at the latest rules of Snort:
Alert TCP! $ HOME_NET ANY -> $ HOME_NET 80 (MSG: "IDS434 - Web Iis - Unicode
TRAVERSAL BACKSLASH "; FLAGS: AP; Content:" .. | 25 | C1 | 25 | 9C "; NOCASE
Alert TCP! $ HOME_NET ANY -> $ HOME_NET 80 (MSG: "IDS433 - Web-IIS - Unicode
Traversal Optyx "; Flags: AP; Content:" .. | 25 | C0 | 25 | AF "; NOCASE
Alert TCP! $ HOME_NET ANY -> $ HOME_NET 80 (MSG: "IDS432 - Web Iis - Unicode
Traversal "; Flags: AP; Content:" .. | 25 | C1 | 25 | 1C "; NOCASE
ISS RealSecure is slightly strong, it uses the method of matching strings to discover attack behavior (http://xforce.iss.net/alerts/advise68.php).
But regardless of Snort or RealSecure signature, only the problem is open. The fact proves that there are still many other problems. For example, the intruder can use% C0% AE to replace the underline, in this case, IIS is the same as the same fragile, But the IDS will stand by side.
There is also a rare fact that if the intruder sends UTF-8 characters that are un-escaped directly to IIS, IIS will not reject. The web browser cannot send these characters directly, but in some A port can be opened to send them. For example, if someone sends characters such as 0 * c0ae to IIS in the URL, IIS will treat it as a UTF-8 treatment, and most IDs will be clear, as for the system It is natural to be a bitter.
Although there are very few tools based on UTF-8, tools like WHINKER and FRAGROUTER have a UTF-8 attack method is just a late or later problem, because almost all of the IIS's IDS can be almost all URL signatures. capture.
For NDIS, perhaps the only feasible method is to make an idea on the UTF-8 process, just like the IIS. This is not a new idea for NIDS vendors, some manufacturers have supported the product of the Telnet process, and there are Manufacturers have supported HTTP processes to control single-byte coding. Say it, named protocol analysis, or call data mode match, but no matter what is called, the problems of these products are still very serious.
Intelligent
The only thing to prepare to do article on UTF-8 is NetWorkice. From the Focus-IDS list, since Schneier begins to disclose this problem, they have started working in this area, and their labor results are extremely complicated. Solution, I have been fortunate to participate in their tests and find that there are many features you need.
Networkice can successfully detect the invasion of UTF-8 standard encoding, if the intruder uses 1 or more UTF-8 interpretations on a single code point, Blackice will successfully identify intrusion, regardless of this invasion based on UTF-8 or Other Web intrusion.
UTF-8 is an unicode algorithm comment, which can explain the way networkice makes the UTF-8 engine, and the IIS is different, so the UTF-8 attack using non-standard characters can capture Blackice.
What is getting worse?
In the apache environment of Solaris, UTF-8 is likely to be elaborated into another side. It is true that TNND has no constant. Nids may be forced to be distributed to each of the apps, but also Surveillance these applications from which IPs from confirmation of interpretation is correct, some manufacturers work hard to this way, and I am afraid that it is probably that it is more or more cumbersome.
Let's do it?
The biggest problem with the network application layer is if the analysis of the IDS can be synchronized, the host-based IDS can record the request, so the invading of the application layer is unique, and these records can reflect the process of the request and request being executed. Procedure. Therefore, host-based IDs may probabilize more suitable IDS on the application layer.
In fact, in the case of the UTF-8 intrusion, there are some measures we can take, NDIS can analyze the agreement, thus split the URI from the 80 port, if a site does not use UTF-8, then any UTF- 8 The attempt will be discovered and caused alarm. Of course, the use of foreign languages will have to use UTF-8, so that this effort is very effective, but at least other people can benefit not shallow.
If IIS allows you to close multiple-bytes of UTF-8 processes, the result will be a lot. UNICODE is a security measure that may be revived, although it is not necessary for many sites. Of course, the current IIS / WINDWS system is abolished Because it has not yet renovated the standard after this revision, MS will modify this bug as soon as possible? Support for the application layer to Unicode is a pronounced work, but unless the entire complex security structure standard is widely accepted The application layer is probably or accepts the selection of UTF-8.
in conclusion
On the Internet, INTERNETs will bring significant security hazards. Existing almost all IDs may power to UTF-8-based attacks. NIDS is difficult for the application layer unless the miracle appears, otherwise difficult, this and the publicity of Everbright manufacturers Just in the south.