Author: Cha Dong Email: chedongATbigfoot.com/chedongATchedong.com
Written in: 2003/04 Last Update: 09/02/2004 16:00:20 Feed Back >>
Copyright Notice: You can reprint anything, please be sure to indicate the original source and author information and this statement in hyperlinks http://www.chedong.com/tech/awstats.html
Keywords: awStats Web Log Analysis Apache IIS Log Analysis Open Source
Summary: Introduction to the use of awStats and some improvements. Attachment: contains AWSTATS 6.0 patch download for Google UTF-8 queries and domestic major portal definition patch (I can overwrite the original program directory)
Log Statistics Plays an important role in the user behavior analysis of the site, especially for keywords from the search engine, visiting statistics: is a very effective user behavior analysis data source. With the development of the Internet for many years, Web log statistics have become more mature and functions are increasing. Many of them are open source, AWSTATS is a very good one.
AWSTATS: Advanced Web Statistics
AWSTATS is an Perl-based web log analysis tool developed quickly on SourceForge. The advantage of Webalizer, Awstats relative to another very excellent open source log analysis tool Webalizer, AWSTATS is:
Interface friendly: You can call the corresponding language interface directly according to the browser (with a Simplified Chinese version) Reference Output Sample: http://awstats.sourceforge.net/cgi-bin/awstats.pl General Log Output address is: awStats.pl • Config = Sitename Based on Perl: And well solves cross-platform problems, the system itself can run on GNU / Linux or Windows (after installing ActivePerl); analysis logs directly support Apache format (Combined) and IIS format ( Need to modify). Although Webalizer also has a Windows platform version, it has been lacking maintenance; AWSTATS fully implements a set of system different web servers: GNU / Linux / Apache and Windows / IIS servers. The efficiency is relatively high: AWSTATS output statistics are much more enriched than Webalizer, and the speed can still reach around 1/3 of Webalizer. For a daily access, this speed is sufficient; configuration / customization: System provides The configuration rules that are flexible but default, and reasonable. You can start running without more than 3,4 items, and the modified and extended plugins are still more; AWSTATS designers are precise " Human visits design, so many search engine robots are all filtered, so it is possible to be low than other log statistics tools, and access from the company can also be filtered off via IP filtering. Provides a lot of extended parameters: Use the Extraxxxx series configuration to generate parameter analysis for specific applications. It is very useful for product analysis.
More related to other tools: Webalizer, Analog, please refer to: http://awstats.sourceforge.net/#comparison
AWSTATS installation memo
The operation mode of awStats is like this:
Analysis Log: When running such log statistics, archive it into an AWSTATS database (plain text); then the output: divided into two forms of reading statistical results database output by the CGI program; one is the running background Export the output into a static file;
The following is 2 examples of a single site log: one is the output of the CGI mode on the GNU / Linux, one is a static page based on Windows 2000.
Download and install
Http://sourceforge.net/projects/awstats/ After downloading the installation package:
GNU / Linux: Tar Zxf awStats-Version.tgz # Deploy AWSTATS CGI Programs to / Path / To / Apache / CGI-BIN / AWSTATSMV AWSTATS-VERSION / Wwwroot / cgi-bin / Path / To / Apache / CGI-BIN / AWSTATS # Put the icon icon directory to the Web Release Directory: / Path / TO / Apache / HTDOCS / ICON /
Windows 2000: Unpack directly, then move to the D: / awStats directory to copy the icon icon directory to IIS's published directory: inetpub / icon
Data source log format and the truncation rule
For Apache: Log format, set to the Combined format, log truncation trouble: You need to install the cronolog tool, set the log into the day: Customlog "| / usr / local / sbin / cronolog / path / to / apache /logs/access_log.%Y%M%D "Combined, for example: logs / access_log.20030326 For IIS: The default has a better log, but IIS's log format is not suitable for AWSTATS statistics, so it is best directly Remove all log fields, then set up in strict accordance with the following list
Date Date Time Time Customer IP Address C-IP User Name CS-UserName Method CS-Method URI Resource CS-URI-STEM Protocol Status SC-Status Send byte SC-BYTES Protocol Version CS-VERSION User Agent CS (User-Agent ) Refer to CS (Referer) Compared to IIS default settings:
Reduce:
Server IP address server port URI queries have been added:
Send byte number protocol version reference
Naming rules for configuration files: awStats.sitename.conf
AWSTATS The main program awStats.pl will automatically call the configuration file of the corresponding site according to the site name: awStats.sitename.conf, such as: Run ./awstats.pl -config = Chedong call is the awStats.Chedong.conf configuration file in the same directory If you don't specify -config, you will also find awStats.conf or /etc/awstats.conf in the current directory as the default configuration file. So it is best to rename the default awStats.Model.conf into awStats.yoursITE.conf; such as: awStats.chedong.conf,
For multiple sites, the AWSTATS configuration file contains features or useful, we can put the general configuration in a document, then use (start support after the 5.4 version) The include configuration contains the general configuration to include all specific configuration files Head, then use other configuration to overwrite the corresponding properties in the universal configuration, such as include = "common.conf" logfile = "/ path / to / bbs / access_log" Sitename = "bbs.chedong.com" least profile Modify: logfile sitedomain logformat
For statistics on GNU / Linux, just modify: logfile siteDomain These 2 options
GNU / Linux logfile = "/ path / to / apache / logs / access_log.% YYYY-24% MM-24% DD-24" Windows 2000 logfile = "D: / IIS_LOGS / W3SV3 / EX% YY-24% MM- 24% DD-24.log "This configuration means with a 24-hour year, monthly, date spelling log file name; siteDomain =" www.chedong.com "site name, the default is empty, if When it is empty, AWSTATS will refuse to run;
AWSTATS is default, it is possible to filter .swf calculated into pageView, so if the SWF file on the site is mainly advertising, it is best to filter it:
Log Analysis ./awstats.pl-Update -config = SiteName -lang = CN
For example: ./ awStats.pl-Update -config = Chedong
The configuration file will be called automatically calling awStats.chedong.conf
Statistical output gnu / linux http://localhost/cgi-bin/awstats/awstats.pl? Config = Chedong
Windows 2000 http://localhost/awstats/awstats.chedong.html
Log statistics automatic operation
GNU / Linux: crontab -e: Running 8:10 every day #Update awstract10 8 * * * (cd / path / to / apache / cgi-bin / awchool)
Windows 2000: Setting 8:10 every 8:10 to run D: /PERL/bin/perl.exe d: /awstats/tools/awstats_buildstaticpages.pl-Update -config = Chedong -lang = CN -DIR = C: / inetpub / awStats / -awstatsprog = d: /awstats/wwwroot/cgi-bin/awstats.pl
Multi-site log statistics
AWSTATS comes with a batch tool: Tools / awStats_UPDATEALL.PL, which can be used in mass overwhelming profiles and run statistics. Therefore, the rest of the work is mainly the synchronization problem of the log.
For multiple sites, many configuration options are repeated. If each profile modifies maintenance, it will be very troublesome, and the awStats starts with the configuration file containing the configuration file from version 5.4, so we can configure a generic configuration, such as: Common. Conf then other sites configured to: override and default confinement configurations through future options. AWSTATS.BBS.CHEDONG.CONF include "chedong.common.conf" logfile "/ path / to / bbs_log" sitename "bbs.chedong.com"
AWSTATS.www.chedong.confinclude "Chedong.common.conf" logfile "/ path / to / www_log" Sitename "www.chedong.com" hostaliases = "chedong.com"
Statistical indicator
Visitors: According to the IP statistics that visitors do not repeat, an IP represents a visitors; visitors: A visitors may visit more than 1 day (such as: once, afternoon), so according to time (for example, : 1 hour), non-repetitive IP statistics, visitors' visits; page number: not including pure pages of pictures, CSS, JavaScript files access, but if a page uses multiple frames, each frame Calculate a page request; file number: The total number of file requests from the browser client, including pictures, CSS, JavaScript, etc., the user requests a page, if the page contains pictures, etc., the server will issue multiple file requests, files The number is far greater than the number of files; bytes: transmitted to the client's data total traffic; data from Referr: Referr field in the log, recorded the address before the corresponding web page, so if the user is through the search engine The search results click to enter the website. There will be a user in the log address in the log engine. This address can be extracted by parsing the keywords used by the user query: for example: 2003-03-26 15:43:58 123.123.123.123 - Get /index.html 200 192 HTTP / 1.1 Mozilla / 4.0 (Compatible; MSIE 5.01; Windows NT 5.0) http://www.google.com/search?q=chedongawstats in search The key phrase and keyword statistics of the engine are still complete: 3 more than 300 machine reptiles can be identified, and most of the mainstream international search engines and local language search engines in many regions can be identified.
Hacking awStatsiis Press GMT Time Patch: awStats.pl
The log time of IIS is between Greenwich, China's local time and GMT have 8 hours gaps. If you use the Timezone plugin from Greenwich Time Conversion, there will be reduced performance, here there is a time according to local time Modify the patch of time coordinates:
7696D7695
7698, 7702c7697 <$ Ix_local = $ ix_local - 24; <} --- > Print "$ IX / N"; # width = 19 instead of 18 to Avoid A Macos Browser bug. 7708, 7712C7703 <$ Ix_local = $ ix_local - 24; <} --- > My $ hr = ($ IX 1); if ($ hr> 12) {$ hr = $ hr-12;} The definition of the Chinese main search engine is added later in AWSTATS 5.5: Here is a complete list of complements (including main portal search and search portal) 62C60 <"baidu / .com", "Search / .sina / .com "," search / .sohu / .com ", --->" baidu / .com "," sina / .com "," 163 / .com "," Tom / .com " , "sohu / .com", 153C144 <"Baidu / .com", "Word =", "Search / .sina / .com", "Word =", "Search / .sohu / .com", "Word =", ---> "BAIDU /.com", "word=", "sina / .com", "word =", "3721 / .com", "name =", "163 / .com", "q =", "Tom /. COM "," word = "," sohu / .com "," word = ", 250C234 <"baidu / .com", "baidu", "search / .sina / .com", "sina", "search / .sohu / .com", "sohu", ---> "baidu / .com "," Baidu "," sina / .com "," 3721 "," 163/00/00 "," Netease "," Tom / .com "," Tom ", "sohu / .com", "sohu", the google's Unicode query still requires some query patches: because Google's IE-free query for Windows 2000 or more is UTF-8 format, while other search engines Part of the system local encoding: GB2312, there is a need to query the URI decoding, and according to whether the UTF-8 is used to the GB2312 transcodation, the same word will remain in the statistical UTF-8 and GB2312 two. recording. I added the following functions for decoding of Google UTF-8 characters and similar to "/ XC4 / XBE / XD7 / XD3 / XC3 / XC0" querying SUB UTF8_TO_ASCII {MY $ String = Shift; MY $ EnCoding = Shift; # CHANGE / XC4 / XBE / XD7 / XD3 / XC3 / XC0 INTO% C4% BE% D7% D3% C3% C0 $ String = ~ s /// x (/ w {2}) /% / 1 / gi; # URI Unescape $ String = URI_UNESCAPE ($ String); if ($ string = ~ m / ^ ([/ x00- / x7f] | [/ xc2- / xdf] [/ x80- / xbf] | / XE0 [/ xa0 - / xbf] [/ x80- / xbf] | [/ x80- / xbf] [/ x80- / xbf] | / xf0 [/ x90- / xbf] [/ x80- / xbf] [/ x80- / xbf] [/ x80- / xbf] [/ x80- / xbf] [/ x80- / xbf]) * $ /) {$ string = decode ("UTF- 8 ", $ String); $ String = Encode ($ Encoding, $ String);} # Trim Space $ String = ~ S / ^ / S / /; $ String = ~ S // S $ //; # reverse" ","; "to space $ string = ~ s /; // g; $ string = ~ s // s // / g; #print $ string." / n "; return $ String;} Here is more about Google UTF-8 query patches. Plug-in installation based on geographic information: GEOIP and GEO :: IpFree (awStats 5.5 ) GeoIP and Geo :: ipfree are free for national / IP shot tables, which are more accurate than the DNS reverse analysis of domain names, and fast. GEOIP's API is free, the default library is free, and it is its data update service. Geo :: ipfree not only the code is open, and the library data is also open, so you can customize it yourself, I have envisaged a map of China to IP. GeoIP installation: first download the C library: GeoIP C After unproved%. / Configure; make # make install and download Perl library: Geoip Perl decomposition% perl makefile.pl; make # make install Geo :: ipfree installation: Download GEO :: ipfree Unpackled% perl makefile% make # make install Configuration: Enable Plug-in GeoIP or Geo :: ipfree reference in the configuration file: AWSTATSHTTP: / /AWSTATS.SOURCE.NET/ WebalizerHttp://www.webalizer.org/ Log Analysis Tool http://directory.google.com/top/computers/software/internet/site_management/log_analysis/ Business log statistics / analysis tool http://directory.google.com/top/computers/software/internet/site_management/log_analysis/commercial/ Multi-site log merge statistics: http://www.chedong.com/tech/rotate_merge_log.html Log statistics on the analysis of the search engine's impact on the site is very important http://www.chedong.com/tech/google.html The awStats itself also includes a lot of plugins, including summarizing the statistics of multiple sites, IIS log Time conversion, URL title mapping, etc. http://awstats.sourceforge.net/awstats_contrib.html Original source: http://www.chedong.com/tech/awstats.html