Author: Daniel J. Roelker Summary: This article describes two possible IDS HTTP protocol based on a common evasion techniques. These technologies include old HTTP escape techniques and new HTTP avoidance techniques. Although the type of avoidance technology is different, they are all in the HTTP protocol request URI part, using standard HTTP / 1.0 and HTTP / 1.1 protocols. Request the evasion technology in the URI address, usually related to the encoding of the URL. For Apache and IIS, there are a variety of legal URL encoding methods, and each encoding is explained, and the specific example is given. This article also uses the attributes of the HTTP protocol to demonstrate HTTP IDS evasive technology against IDS. By reading this article, the reader will understand the principle of HTTP IDS evasive technology, and can use these general principles and examples to achieve HTTP IDs that meet yours. Index terms: computer security, hypertext transmission protocol, intrusion detection, network scan I. Introduction Network Scanner Whisker since Rain Forest Puppy (RFP) was first announced in public [1], HTTP IDS evasion technology has gradually popped up. The original HTTP IDS technology is from the first version of WHISKER, including simple use of multiple "/" confusing technology, including more complex - insert "http / 1.0" in the URL to avoid Those IDS algorithms for searching URL addresses. In addition to avoidance techniques in WHisker, there are other types of HTTP confusion methods. A method of confusing URL is to use absolute URI and relative URI [2]. Although these methods are interesting, it is better than those used in the WHINKER scan. The next popular escape method is also RFP released, using Microsoft Internet Information Server (IIS) UTF-8 Unicode Decoding Vulnerability [3]. Although it is a severe vulnerability of IIS, it also gives a URL encoding method that IDS has not been implemented. So far, most of the IDS is still just concerned about the previous WHISKER's ASCII coding and directory traversal to avoid technology, but there is no corresponding protection for Unicode's UTF-8 encoding. Eric Hacker has written a very professional article to this type of HTTP IDS, written a very professional article [4]. This article also analyzes and explains some views in Hacker. We will continue Hacker's views and understand what these codes mean, how can I make more strange codes. Other types of HTTP IDS evasive technology described herein use the properties of the HTTP protocol. One of them is to request a pipeline, and use the content editor and place the parameters of the HTTP request to the technology in the request load. II. IDS HTTP Protocol Analysis In order to identify URL attacks, IDS must check the URL field of HTTP to see if there is malicious content. Two most popular IDS detection methods - pattern matching and protocol analysis - requires whether or not the URL contains malicious content (by some form of pattern matching or HTTP protocol analysis). The difference between the two methods depends on your purpose, the protocol analysis method only searches only the malicious content of the HTTP stream URL field, and the search scope of the pattern matching method is the entire packet. These two methods are similar to the behavior before the malicious URL. Thereafter, the protocol analysis method only needs to add a suitable decoding algorithm to the URL field (it already has a built-in HTTP protocol decoding engine). The pattern matching algorithm does not know which part of the package needs to be normalized, so it is necessary to combine some form of protocol analysis to find the corresponding URL field to use the corresponding decoding algorithm. Some form of HTTP protocol analysis is added to the pattern matching method, and the two are similar.
Due to the like of these IDS methods, the HTTP IDS evasion method discussed herein is suitable for various types of IDs. The first generic IDS escape method is an invalid protocol parsing. For example, if the HTTP URL is not found correctly, the malicious URL cannot be checked, because IDS did not find URL, and the URL cannot be decoded. If the URL is correct, IDS must know the correct decoding algorithm, otherwise, the correct URL cannot be obtained. This is the second IDS evasive technology - invalid protocol field decoding. A. Invalid protocol parsing uses invalid protocol to parse IDS evasive technology, a lot of examples in the RFP WHINKER [1] and Bob Graham SideStep [5]. The difference between these two programs is that WHisker uses defective IDS protocol parsing to avoid inspections, while SideStep uses normal network layer protocol to evade IDS protocol decoder. In this case, the invalid protocol parsed evasive technology is very effective for the two fields of the HTTP protocol and the URL parameters. For example: If the IDS's HTTP decoder assumes that only one URL per request package is included, then a package contains two URLs, and IDS cannot correctly parse the second URL. This technique will also be mentioned in the request of the pipe. B. The invalid protocol section decoding invalid protocol section decoding can test if the IDS can process various types of decoding of a particular protocol segment. If it is http, the main goal is the URL field. For IDS, it is necessary to test the level of compliance with the HTTP RFC coding standard, but also to support the coding type (for example, IIS) for a particular web server. If IDS cannot decode some of the URL encoding, the attacker can use the encoding to skip the detection of malicious URLs. Another HTTP invalid protocol segment code is confused by a directory, and manipulating directory properties is implemented. For example: For / cgi-bin / pHF, multiple "/" instead of "/" can be used to change the "appearance" of the directory, or use the directory throughout the directory path. It should be noted that the directory confused can only hide the malicious URL only when the IDS finds the directory and files. For "/ cgi-bin / pHf", if the IDS is looking for "PHF" file in the "CGI-Bin" directory, our attack example can work; if the IDS is only looking for "PHF" file, the directory confusion method does not used. Iii. The subject of the invalid protocol decoding URL confused is the various types of encoding methods accepted by the HTTP server. In fact, most encoding methods are related to IIS. For the integrity of the article, each encoding type is tested for each HTTP server. Using the URL encoding to confuse the ideological basis for web attacks, it is most of the IDS lack of analysis of different types of web server encoding. There is a problem with the mode matching and protocol detection technology of IDS. For the encoding of the URI request, only two RFC standards: hexadecimal encoding and UTF-8 Unicode encoding. Both methods use "%" to represent encoding. Apache also supports these two URL encoding types. Most of the other coding types we have studied are all related to the server and does not meet the RFC standards. Microsoft's IIS Web server belongs to this. In this paragraph also included the URL confusion. A. The hexadecimal encoding hexadecimal coding method is a way to comply with the RFC requirements for the URL and the simplest URL encoding method. This method only needs to add a "%" before the hex byte value of each coded character.
If we want to do hexadecimal coding on the upper-written A (ASCII's hexadecimal value is 0x41), the result of the encoding is: •% 41 = 'A' B. Double Hundred Snaps Hexadecimal Coding Double Hundred Sectional Hexadecimal Codes is based on normal hexadecimal code. The specific method is to encode the percentage and subsequent information encoding the hexadecimal value. Coding the upper-written A, the result is: •% 2541 = 'a' You can see, the code of the percent number is% 25 (equivalent to "%"), the value is decoded into% 41 (etc. The price is "a"). This encoding method is supported by Microsoft IIS. C. Double four hexadecimal code double four hexadecimal coding is also based on standard hexadecimal code, each four hexadecimal use standard hexadecimal coding methods. For example, to the upper-write A encoding, the result is: • %% 34% 31 = 'A' normal A, hexadecimal code is% 41. The method of double four-bit hexadecimal encoding is to encode each four digits, and therefore, 4 is encoded as% 34 (this is the ASCII value of the number 4), the second four digits, 1, encoded to% 31 (This is the ASCII value of the number 1). After the first URL decoding, the four-bit value becomes digital 4 and numbers 1. Because there is one% in front of 4 and 1, the second time will decode% 41 as uppercase A. D. The first four hexadecimal encoding first four hexadecimal codes are similar to the double four hexadecimal code, and the difference is only the first four digits are encoded. Therefore, for uppercase A, the double four hexadecimal encoding is %% 34% 31, and the results of the first four hexadecimal coding are: • %% 341 = 'a' like before, the first time After the URL decoding, the% 34 is decoded as a number 4, so the object at the time of the second decoding is% 41, and the last result is still capitalized. E. After the four hexadecimal encoding, the four hexadecimal codes were identical to the first four hexadecimal codes, but only the last four digits of standard decoding were executed. Therefore, the coding result of uppercase A is: •% 4% 31 = 'a' When the first decoding, the% 31 decodes the number 1, the second decoded object is% 41, and the final result is "a". F. UTF-8 encoding 1) UTF-8 Description UTF-8 encoding allows values that are greater than single byte (0-255) in the form of word streaming. The HTTP server uses the UTF-8 encoding to represent the Unicode code greater than the ASCII code range (1-127). When UTF-8 works, the high level of the byte has a special meaning. The two bytes of UTF-8 and three-bytes of UTF-8 are represented as follows: 110xxxxx 10xxxxxx (two byte sequence) 1110xxxx 10xxxxxx 10xxxxx (three-byte sequence) UTF-8 sequence first byte is the most important, Through it you can know how many bytes of this UTF-8 sequence, this is obtained by checking the number of 1 before the first 0. In the example, the two bytes of UTF-8 sequences, and there are two in the high position before 0. The bit behind the first UTF-8 byte 0 can be used to calculate the final value. The UTF-8 byte format in the rear is the same, the highest bit is 1, the secondary high is 0, and the two are used to identify UTF-8, and the remaining 6 bits are used to calculate the final value. In order to make UTF-8 encoding for the URL, each UTF-8 byte is converted with a percent sign.
An example is:% C0% AF = '/'. 2) Unicode Code Point Introduction You can encode the Unicode code value using UTF-8 coding. The range value of the code point is usually 0-65535, and any code point value greater than 127 in the HTTP URL uses UTF-8 encoding. The Unicode code point of 0-127 will be mapped into a separate ASCII value. In this way, 65408 values are left, which can represent characters in other languages (such as Hungarian or Japanese). Typically, these languages have their own Unicode code page, which can get the codage value of Unicode from the Unicode code page. Each Unicode code page has its own unique value, so if the Unicode code page changes, the characters represented by the Unicode code point value are different. This concept is important for the URL coding of the next section. 3) Comprehensively integrate the avoidance means to handle UTF-8 encoded Unicode code points, mainly three reasons: The first reason is that UTF-8 encoding can use a code point value or ASCII value more than one way Said, this has been revised in the recent Unicode standard, but is still very common in the web server (including Apache). For example, uppercase letters A can be encoded with two bytes of UTF-8 sequence: •% C1% 81 (11000001 10000001 = 1000001 = 'a'), the uppercase letter A can also be encoded by three bytes of UTF-8 sequence: •% E0% 81% 81 (11100000 10000001 10000001 = 1000001 = 'a') Therefore, using UTF-8 to encode the ASCII character, there will be many results. In the second reason, some non-ASCII Unicode code points can also be mapped to ASCII characters. For example, Unicode code points 12001 can be mapped to uppercase letters A. If you want to know which code point can be mapped to the ASCII character, you either read the entire Unicode code mapping, or test all different Unicode code points for the server. Currently, the only WEB server that does this is Microsoft's IIS server. The third reason is related to the second cause. If the Unicode code mapping changes or unknown, the translated Unicode code point may be invalid. This is important because China, Japan, Poland and other countries IIS web servers use different code pages, so if IDS does not understand the code page used by the web server, the result of UTF-8 decoding for the URL is likely to be wrong. So if an IDS cannot configure the Unicode code page that is used by the monitored server, any web server is unprotected for IDS without monitoring code. G. UTF-8 empty byte encoding UTF-8 empty byte encoding is similar to UTF-8 coding, the difference is that the percentage is not applied, the transmitted byte is the actual byte, if A is encoded, the result is : • 0xc1 0x81 = 'a' This type of encoding is only supported by Microsoft's IIS server. H. Microsoft% U Code Microsoft% u Code uses a unique way to encode the UNICODE code point value and 65535 (or two bytes). The format is very simple, and the% U is a four four-bit value of the Unicode code point value: •% UXXXX, for example, uppercase A can be encoded: •% u0041 = 'a' This encoding is Microsoft IIS stand by. I. Matching coding mismatch coding Using different coding methods to represent an ASCII character, this is not a separate code.