Http://www.blogcn.com September 14, 2004 20:28
Author: redhatlinux
Do you want to know what person views this? You can find it to see the Apache access log. Access log is the standard log of Apache, which explains the content of the access log and the configuration of the relevant options. The format of the access log jumps to >>> Error Log Log Analysis Advanced Technical Customized Log Apache built a function of logging server activity, which is its log function. This "Apache Log" article introduces the Apache's access log, the error log, and how to analyze log data, how to customize the Apache log, how to generate statistics from log data. If the Apache installation is the default installation, there are two log files generated by the server. These two files are Access_Log (Access.log on Windows) and ERROR_LOG (Error.log on Windows). When using default installation, these files can be found under / usr / local / apache / logs; for Windows systems, these log files will be saved in the Logs subdirectory of the Apache installation directory. Different package managers will put the log files in a variety of different locations, so you may need to find other places, or see where these log files are configured. As indicated by its name, access log Access_log records all access activities to the web server. Below is a typical record in the access log: 216.35.116.91 - - [19 / Aug / 2000: 14: 47: 37 -0400] "GET / HTTP / 1.0" 200 654 This line of content consists of 7 items, the above example There are two gaps in the middle, but the total content is still divided into 7 items. The first information is the address of the remote host, that is, it indicates who is accessing the website. In the above example, the host accessing the site is 216.35.116.91. Just say, this address is a machine called Si3001.inkTomi.com (to find this information, you can use the NSLookUP tool to find DNS), inktomi.com is a company that makes web search software. It can be seen that we can get a lot of information about visitors from the first item from the log record. By default, the first information is just the IP address of the remote host, but we can ask Apache to find all host names and use host names in the log file to replace the IP address. However, this approach is often not worth recommending because it will greatly affect the speed of the server record log, thereby reducing the efficiency of the entire website. In addition, there are many tools to convert IP addresses in log files to host names, so ask APACHE logging host name alternative IP addresses to be lost. However, if it is necessary to make Apache to find the name of the remote host, then we can use the following instructions: HostnameLookups on if HostNameLookups are set to Double instead of the ON, the logging program will reverse find the host name it finds, verify the The host name does point to the original IP address. By default, HostNameLookups are set to OFF. The second item in the above log record is blank, replaced with a "-" placeholder. This is actually the case this. This location is used to record the viewer's identity, which is not just the browser's login name, but the browser's Email address or other unique identifier. This information is returned by IdentD or returned directly by the browser.
It was very early, then Netscape 0.9 also dominated the dominant position, which often records the browser's email address. However, because someone uses it to collect mail addresses and send spam, it has not been kept for a long time, and there is almost all browsers on the market for a long time. So, today, our second item in logging to see the opportunity of the Email address is slightly micro. The third item of logging is also blank. This location is used to record the name provided when the viewer is authenticated. Of course, if some of the content of the website requires the user to authenticate, then this information will not be blank. However, for most websites, this item is still blank in most records of log files. A fourth item for logging is the time of the request. This information is surrounded by square brackets and uses so-called "public log format" or "standard English format". Therefore, the above-mentioned log record indicates that the time of the request is Wednesday, August 19, 2000 14:47:37. The last "-0400" of time information indicates that the server is located 4 hours before UTC. The fifth information of the log record may be the most useful information in the entire log record, telling us what kind of request is received by the server. The typical format of this information is "Method Resource Protocol", "Method Resource Protocol". In the above example, Method is GET, other Method, which often may appear, and POST and HEAD. There are also a lot of possible legitimate Method, but mainly these three. Resource is a document requesting the viewer to the server, or URL. In this example, the viewer requests "/", the home page or root of the website. In most cases, "/" points to the Index.html document for the DocumentRoot directory, but it may point to other files depending on the server configuration. Protocol is usually http, and then add the version number. The version number or 1.0, or 1.1, but there is a much time when there is 1.0. We know that the HTTP protocol is the foundation of Web, and HTTP / 1.0 is an earlier version of the HTTP protocol, and 1.1 is the nearest version. Most web clients still use the version 1.0 version of the HTTP protocol. The sixth information of the log record is the status code. It tells us whether it is successful, or what kind of error has been encountered. Most of the time, this value is 200, which means that the server has successfully responded to the browser's request, everything is normal. This is not prepared to give a complete list of status code and explain their meaning, please refer to the information about this information. However, in general, the status code starts in 2 is successful, and the state code starts with 3 indicates that the user request is redirected to other locations, and the state code starting with 4 indicates a certain error, The status code starting with 5 means that the server encounters an error. The seventh item of the log record indicates the total number of bytes sent to the client. It tells us whether it is interrupted (ie, whether the value is the same as the file size). Put these values in the log record can you know how much data is sent within one day, one week or in January. Second, the location of the configuration access log Access log file is actually a configuration option. If we check the httpd.conf configuration file, you can see this line in this file: CustomLog / USR / local / apache / logs / access_log Common Note that this line of content may be slightly different for version earlier Apache servers. . It may not be a CustomLog directive, but the TransferLog command. If your server belongs to this type, it is recommended that you upgrade the server as soon as possible.
The CustomLog directive specifies the specific location of the saving log file and the format of the log. As for how to customize the format and content of the log file, we will discuss later in this "Apache Log" article. The above line instruction specifies the Common log format. Since the Web server starts, the Common format is its standard format. From this we can also understand that although there is almost no client program to provide users' identification information to the server, the access log has retained the second content. The path in the CustomLog command is the path to the log file. Note that since the log file is opened by the HTTP user (specified with the user instruction), you must pay attention to this path to be safe guaranteed to prevent this file from being rewritten. Several of the "Apache Log" article will continue to introduce: Apache Error log, custom logs format and content, how to write log content into the specified program instead of file, how to get some very useful statistics from the log file, and many more. Error Log Jump to "" Access Log Table Log Log Analysis Advanced Technology, describes how to set up and error logs related options, document errors, and CGI errors, and how to easily view log content, and so on. I. Location and content discusses Apache's access log, including its content, format, and how to set access logs. This article we have to discuss another Apache standard log - error log. Error logs are different in format or in content and access logs. However, the error logs and access logs also provide rich information, and we can use this information analysis server's operation, where there is a problem. The file name of the error log is ERROR_LOG, but if it is a Windows platform, the file name of the error log is Error.log. The location of the error log can be set by ERRORLOG command: ERRORLOG logs / error.log unless the file location begins with "/", this file location is relative to the ServerRoot directory. If Apache is installed by default installation, the location of the error log should be under / usr / local / apache / logs. However, if Apache is installed with a certain package manager, the error log is likely to be in other locations. As indicated by its name, the error log records various errors encountered during the server, as well as some normal diagnostic information, such as when the server starts, when it is closed. We can set the number and type of log file record information level, control log file log information. This is set by the Loglevel instruction, the level of the default setting is Error, that is, the error is recorded. For a complete list of various options that allow settings in this directive, see the apache document http://www.apache.org/docs/mod/core.html#loglevel. In most cases, we are allocate two categories: document errors and CGI errors. However, the configuration error occurs occasionally in the error log, and the server startup and shutdown information mentioned earlier. Second, the document error document error and the 400 series code in the server response, the most common is 404 error - Document Not Found (Document is not found). In addition to 404 errors, user authentication errors is also a common error. 404 Error The resource (ie URL) of the user request does not exist, it may be due to URL errors input by the user, or because the document exists due to the original server is deleted or moved.
By the way, according to Jakob Nielson, we should never move or delete any resources of the Web site without providing redirection or other remedies. For more articles of Nielson, see http://www.zdnet.com/devhead/alertbox/. When the user does not open the document on the server, the records appearing in the error log are as follows: [Fri Aug 18 22:36:26 2000] [Error] [Client 192.168.1.6] file does not exist: / usr / local / Apache / bugletdoCS / IMG / South-Korea.gif can be seen, just like accessing the log Access_log file, the error logging is also divided into multiple items. The beginning of the error record is the date / time tag, pay attention to their format and the format of the date / time in Access_Log. The format in Access_Log is called "standard English format", which may be a joke with us, but it is too late to change it. The second item of the error record is the level of the current record, which indicates the severity of the problem. This level information may be any of the levels listed in the documentation of the Loglevel instruction (see the link in front of the Loglevel), and the Error level is between the WARN level and the CRIT level. 404 belongs to an Error error level, this level indicates a problem, but the server can run. The third item of the error record indicates the IP address used when the user issues a request. The last item recorded is a real error message. For 404 errors, it also gives a full path indicating that the server attempts to access. This information is very useful when we expect a file to have a 404 error in the target position. The cause of this error occurs is often due to server configuration errors, the virtual hosts in which the file actual is different, or other unexpected situations. The error records that appear due to user authentication issues are as follows: [Tue Apr 11 22:13:21 2000] [Error] [Client 192.168.1.3] User RBowen @ rcbowen. Com: Authentication Failure for "/ cgi-bin / HiRecareers / Company.cgi: Password Mismatch Note that due to the direct result of the user request, they will also have a corresponding record in the access log. Third, the most important use of the CGI error error log is a CGI program that diagnose behavior abnormalities. For further analysis and processing, the CGI program outputs all the contents of STDERR (Standard Error, Standard Error Device) will directly enter the error log. This means that any good CGI program is written, if there is a problem, the error log will tell us about the problem. However, the CGI program error output to the error log also has its shortcomings. There are many contents that do not have standard formats in the error log, which makes it quite difficult to analyze useful information from it to the error log automatic analysis program.
Here is an example, it is an error record that appears in the error log when debugging Perl CGI code: [WED JUN 14 16:16:37 2000] [Error] [Client 192.168.1.3] Premature End of script headers: / usr /local/apache/cgi-bin/HyperCalPro/announcement.cgi Global symbol "$ rv" requires explicit package name at /usr/local/apache/cgi-bin/HyperCalPro/announcement.cgi line 81. Global symbol "% details" requires explicit package name at /usr/local/apache/cgi-bin/HyperCalPro/announcement.cgi line 84. Global symbol "$ Config" requires explicit package name at / usr / local / apache / cgi-bin / HyperCalPro / announcement. CGI Line 133. Execution Of /us/local/apache/cgi-bin/hypercalpro/announcement.cgi Aborted Due To Compiration Errors. You can see that the CGI error and the previous 404 error format are the same, including date / time, error level, and Customer address, error message. But this CGI error error message has a few lines, which often interferes with some error log analysis software. With this error message, even those who are less familiar with Perl can find many information about the wrong information, such as at least conveniently known that there is a problem. Perl is quite perfect in reporting procedures. Of course, different programming languages output to the error log will vary. Due to the particularity of the CGI program operating environment, most CGI programs will be difficult to resolve if there is no error in the error log. Many people complain that they have a CGI program in the mailing list or newsgroup, and the server returns an error when opening the web page, such as "Internal Server Error". We can be sure that these people have not seen the server's error log, or do not know the existence of the error log at all. In most cases, the error log can accurately point out the CGI errors and how to fix this error. Fourth, check the log file I often tell others that I will constantly check the log of the server while developing, so that I can immediately know where problems. But what I get the answer is often silent. I thought that this silence means "you of course do this,", then I found this silence of the real meaning "I don't know the practice of others, but I don't do it yourself." Though, below we are still To see how to easily view the server log file. Connect to the server with Telnet, then enter the following command: tail -f / usr / local / apache / logs / error_log This command will display the last few lines of content of the log file, if there is a new content to join the log file, it also The newly added content will now be displayed. Windows users can also use this method, such as using a wide range of Unix tool packages provided for Windows.
I personally hobbied a tool called AINTX, it can be found at http://maxx.mc.net/~jlh/nttools/index.htm. Another alternative is to use the following Perl code, which uses a module called File: Tail: Use file :: tail; $ file = file :: tail-> new ("/ some / log / file" While ($ line = $ file-> read) {print "$ line";} No matter which method is, it is a good habit of opening multiple terminal windows: such as in a window The error log is displayed, and the access log is displayed in another window. This allows us to know what happen on the website and immediately solve it. In this "Apache Log" series, we will discuss custom server Log, how to log all our information we want in the log file, exclude all the information we don't want. After that, we will also discuss the processing of the log file, which generates a statistical report from the log file. In the last few articles In the article, we will also discuss how to redirect the log record to the specified program instead of saving the log file to process the newly generated log data in real time, such as saving the log data to the database, or when some key Sexual error When email is sent to the system administrator, wait. Customized logbeats to "" "Access log error log log Analysis Advanced Technology Sometimes we need to customize the format and content of the Apache default log, such as increase or decrease The information recorded by the log, changing the format of the default log file. This article describes all the information that can be recorded, and how to set up Apache to record this information. 1. Define the log format (April 3) a long time ago, log files There is only one format, this is "public format", many people have become accustomed to using this format. There is a custom log format, and the custom log format is more popular, even if the public log format itself also reuses custom logs Format definition. This article is how to customize the format of the log file with your heart, how to make the log file to record the information you want. The format of the custom log file involves two instructions, the logformat directive, and the Customlog command, the default httpd.conf file Several examples of these two instructions are provided. Logformat directives define the format and specify a name for the format, and we can directly reference this name. Customlog instruction set the log file, and refer to the format used by the log file (usually through format Name). The function of the logformat instruction is to define the log format and specify a name for it. . For example, in the default httpd.conf file, we can find the following line code: logformat "% h% L% u% T /"% r / "%> S% B" COMMON This instruction creates a name "Common" log format, the format of the log is specified in the content surrounded by the double quotes. Each variable in the format string represents a specific information, which is written to the log file in the order specified in the format string.
The Apache documentation has given all variables and its meaning available for the format string, the following is its translation: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------% ... A: Remote IP Address% ... A: Local IP address% ... B: The number of bytes that has been sent, does not include the HTTP header% ... B: The number of seductive numbers in the CLF format does not include an HTTP header. For example, when no data is sent, '-' rather than 0. % E: Environment Variable FOOBAR content% ... f: file name% ... H: Remote Host% ... H Request Protocol% i: FOOBAR content, the header of the request to the server. % ... l: Remote login name (from identd, if provided)% ... M request method% N: Note "FOOBAR" from another module "FOOBAR content, the header of the answer Line% ... P: Port% ... P: response to the sub-process ID of the request. % ... Q Query strings (if there is a query string, include "?" behind the part; otherwise, it is an empty string.)% ... r: The first line of the request ... S : State. This refers to the state of * the state of the request. If you use% ...> s, it refers to the later request. % ... T: Time (or a standard English format)% T: Time% indicated by specifying format format ... T: Time to respond to request, in seconds ... u: Remote user (from auth; if the return status (% s) is 401, it may be forged)% ... u: URL path% requested by the user ... V: SERVERNAME for the server response request % ... V: Server name obtained in accordance with UseCanonicalName ----------------------------------- ---------------------------- "..." represents an optional condition in all the variables listed above. If there is no condition, the value of the variable will be replaced by "-". Analyze the logFormat instruction example from the default httpd.conf file, you can see that it creates a log format called "Common", including: remote host, remote login name, remote user, request time, request Row code, request state, and number of bytes sent. Sometimes we just want to record certain, defined information in the log, and you should use "...". If one or more HTTP status code is placed between "%" and variables, the content represented by the variable is only recorded when the state code that the request returned belongs to the specified state code.
For example, if we want to record all the invalid links of the website, you can use: ----------------------------- -------------------- Logformat% 404 {refrer} i brokylenks --------------------- ----------------------------- On the other hand, if we want to record those status code does not equal the request, simply join A "!" Symbol: logformat%! 200U SomethingWrong Log Analysis Jump to "" Access Log Error Log Advanced Technology Customize Logs Although the log file contains a lot of useful information, this information is only after deep excavation. Can maximize role. This article first discusses information that can be obtained from the log file and the information that cannot be obtained from the log file, then introduces several excellent log analysis tools and how to program the log files. 1. What information can be obtained (April 4) In the previous few articles of this "Apache Log" article, we discussed Apache's standard log files - access logs and error logs, and how to customize log files. This article discusses how to analyze log files to get valuable statistics. The problem we face is that although a lot of information is included in the log file, this information is not much directly helpful for us. In order to manage and plan the website, we need to know: How many people have viewed the website, what are they look, how long they stay, they learned about this website, and so on. All of this is hidden in (or possibly hidden) log files. As far as the website operator, they also want to know the name, address, shoes size, and even the browser's credit card number, but this information is not available from the log file. To this end, as a technician, we must know how to explain to these operators: This part of the information is not only available from the log file, but the only way to get this information is to ask the viewer himself, and Rejecting preparation. There are many information that can be recorded with a log file, including: Remote machine address: "The address of the remote machine" and "Who are browsing the website" almost, but it is not equivalent. Specifically, the address of the remote machine tells us where the viewer comes from, for example it may be buglet.rcbowen.com or proxy01.aol.com. Browse time: When is the viewer starting to access the website? From this question, we can learn a lot of situations. If most of the viewers of the website visited the website at 9:00 am and 4:00 pm, you can believe that most of the website's viewers are always accessing; if visit records appear at 7:00 pm At midnight, we can affirm that the viewers are generally at home. Of course, the information that can be obtained from a single access record is very limited, but if we start from thousands of access records, we can get very useful and important statistics. Resources accessed by users: Which parts of the website are most popular? These most popular parties are parts we should continue to develop. Which parts of the website are always cold? These refined parts may be hidden too deeply, maybe they do not mean, and we have to improve the way. Of course, the content of the website, such as legal statements, although few people have access, but should not change them casually. Invalid link: Of course, the log file can also tell us what to run in accordance with what we think.
Is there a wrong link in the website? Is there any wrong URL? Is there any CGI program that cannot run properly? Is there a search engine search program that issued thousands of requests per second, which affects the normal service of this website? The answers to these questions can be found from the log file. Advanced Technology Jump to "" "Access Log Error Log Log Analysis Custom Log This is the last article of the" Apache Log "series, in addition to supplementing a few articles, three issues are discussed: How to record logs Write the specified program instead of log files, how to rotate logs from insufficient disk space, multiple virtual host environments, log file management. First, write the log record to the specified program in the previous article of this "Apache Log" article, we discussed several log file analysis tools. It should be added that it does not list all of the analysis tools. On Google simply search "Apache Log Reporting" or similar keywords, return up to hundreds of pages related to this topic, and many suppliers sell their own unique solutions for this relatively simple question. Logging is not only written to the file, and it can also write to the specified process. This is very useful when we want to write log information into the database, or some programs that can display website traffic statistics in real time. So how do you achieve this? Using TransferLog or Customlog instructions, we can specify "|", and add the program name that receives log information. For example: Customlog | /usr/bin/apachelog.pl circham where /usr/bin/apachelog.pl is a program that knows how to handle the record of the Apache log file. In fact, this program is very simple, such as it can be a Perl program that handles the log record in some way, or a program writes logging to the database. Security issues are the best concern when using this method of using this log data. The log file is opened by the permissions of the user who starts the server, usually root. This is equally valid for programs that write logs to the database, so it should be ensured that programs for recording log data have sufficient security assurance. If the log data is recorded in an unsafe program (this program may be invaded and modified by non-root users), the system is facing the danger of logging programs by other malicious programs. For example, if /usr/bin/apachelog.pl can be modified by users around the world, any user will be able to edit this file to close the web server, send the password file to a mailbox, or delete some important files because The root user has permissions for all of these. If you want to write the log record to a program, it is recommended to find a module with ready-to-function. Please visit http://modules.apache.org/, the site collects many modules for Apache to complete various actual tasks. Second, the rotation log log file will get bigger and bigger, if you accidentally put the log files in / var, the log file may write a partition, resulting in the server to be forced to stop running. This kind of thing has really happened. The way to prevent this problem is that the log file has moved to other sufficient space before the log file becomes too large. This can be achieved in several ways. Some UNIX variants provide a logrotate script that helps us complete this task. For example, Redhat has been pre-configured, and it rotates the log file every few days depending on the size of the log file or the usage time of the log file.