Log Statistics Plays an important role in the user behavior analysis of the site, especially for keywords from the search engine, visiting statistics: is a very effective user behavior analysis data source. With the development of the Internet for many years, Web log statistics have become more mature and functions are increasing. Many of them are open source, AWSTATS is a very good one. AWSTATS: Advanced Web StatisticsawStats is a recently developed Perl-based Web Log Analysis tool. The advantage of Webalizer, Awstats relative to another very excellent open source log analysis tool Webalizer, AWSTATS is:
Interface friendly: You can call the corresponding language interface directly according to the browser (with a Simplified Chinese version) Reference Output: http://awstats.sourceforge.net/cgi-bin/awstats.pl
Based on Perl: And well solved cross-platform problems, the system itself can run on the GNU / Linux or Windows (after installing ActivePerl); the analyzed log directly supports the Apache format (Combined) and IIS format (you need to modify). Although Webalizer also has a Windows platform version, it has been lacking maintenance; AWSTATS can implement uniform statistics for different servers for their own site with a system: GNU / Linux / Apache and Windows / IIS servers. The efficiency is relatively high: AWSTATS output statistics have a lot of Webalizer enriched, and the speed can still reach 1/3 of Webalizer, and this speed is sufficient for a day-to-day access.
Configuration / customization Easy: The system provides a configuring rule that is flexible but default, and the default configuration is not more than 3,4 items, and the modified and extended plug-in is still more; awStats The designer is designed for accurate / "human visits", so many search engines robotic access is filtered, so it is possible to be low than other log statistics tools, and access from the company can also pass IP. Filter setting is filtered out. Provides a lot of extended parameters: Use the Extraxxxx series configuration to generate parameter analysis for specific applications. It is very useful for product analysis. More related to other tools: Webalizer, Analog, please refer to: http://awstats.sourceforge.net/#comparison
AWSTATS Installation Memorial AWSTATS is like this:
Analysis Log: When running such log statistics, archive it into an AWSTATS database (plain text); then the output: divided into two forms of reading statistical results database output by the CGI program; one is the running background script Export the output into a static file; the following is 2 examples for a single site log: one is the output of the CGI mode on the GNU / Linux, one is a static page based on Windows 2000.
Download / Install http://sourceforge.net/projects/awstats/ Download After the installation package: GNU / Linux: Tar Zxf awStats-5.4.tgz # Deploy AWSTATS CGI program to / Path / to / apache / cgi-bin / awStatsmv awStats -5.4 / wwwroot / cgi-bin / path / to / apache / cgi-bin / aw / apache / cgi-bin / awstract copy icon icon directory to the web publisher: / path / to / apache / htdocs / icon /
Windows 2000: Directly Unpack, then move to the D: / awStats directory to copy the icon ICON directory to the IIS publishes: inetpub / icon Data Source Log Format and By-day truncation rules For Apache: Log format Good settings: Settings It can be combined into the combined format, the log is cut off, you need to install the cronolog tool, set the log into the sky: CustomLog "| / usr / local / sbin / cronolog / path / to / apache / logs / access_% Y% M% D.log "Combined, for example: logs / access_030327.log logs / access_030326.log For IIS: The default has a better log, but IIS's log format is not suitable for awStats statistics, so it is best to remove all logs directly Field, then set up in strict accordance with the following list
Date Date Time Time Customer IP Address C-IP User Name CS-UserName Method CS-Method URI Resource CS-URI-STEM Protocol Status SC-Status Send byte SC-BYTES Protocol Version CS-VERSION User Agent CS (User-Agent ) Refer to CS (Referer)
Compared to IIS default settings: Decreased: Server IP address server port URI queries increased: Sending a Number Protocol Version Reference Profile Name Rules: awStats.Sitename.conf awStats The main program AWSTATS.PL will automatically Site name calls the configuration file for the appropriate site: awStats.sitename.conf, such as: Run ./awstats.pl -config = Chedong call is the awStats.chedong.conf configuration file under the directory; if you do not specify -config, you will also find The awStats.conf or /etc/awstats.conf in the current directory is the default configuration file. So it is best to rename the default awestats.model.conf; such as: awStats.chedong.conf, for multiple sites statistics, the AWSTATS configuration file contains features or useful, we can put The generic configuration is placed in a document, then uses the INCLUDE configuration supported by 5.4 to include the general configuration in the header of each specific configuration file, and then override the corresponding attributes in the universal configuration, such as include = "chedong.common .conf "logfile =" / path / to / bbs / access_log "SiteName =" bbs.chedong.com "
Minimum profile modification: logfile site logFormat For statistics on GNU / Linux, simply modify: logfile site 3 of these 2 options
GNU / Linux logfile = "/ path / to / apache / logs / access_% yY-24% MM-24% DD-24.log" Windows 2000 logfile = "D: / IIS_LOGS / W3SV3 / EX% YY-24% MM -24% DD-24.Log "This configuration means with a log file name, month, date, a date, date, and the name of the site, the name of the site, the default is empty, If you are empty, AWSTATS will refuse to run; for the statistics IIS log, you need to modify one: logformat = 2 Default is 1: Apache Log, 2 is what you need to pay attention to in IIS log: / nawstats defaults to filter SWF files, will Put the .swf calculated into pageview, so if the SWF file on the site is mainly the advertisement, it is best to filter out: Log Analysis ./awstats.pl-Update -config = SiteName -Lang = CN, such as: ./ awStats.pl-Update -config = Chedong automatically calls awStats.chedong.conf this configuration file
Statistical output gnu / linux http://localhost/cgi-bin/awstats/awstats.pl? Config = ChedongWindows 2000 http://localhost/awstats/awstats.chedong.html
Log Statistics Automatic Run on GNU / Linux: crontab-E: Running 8:10 Every 8:00 pm #Update awStats10 8 * * * (cd / path / to / apache / cgi-bin / awteats /; ./awstats.pl-update - Config = chedong)
Windows 2000: Setting 8:10 every 8:10 to run D: /PERL/bin/perl.exe d: /awstats/tools/awstats_buildstaticpages.pl-Update -config = Chedong -lang = CN -DIR = C: / inetpub / awStats / -awstatsprog = d: /awstats/wwwroot/cgi-bin/awstats.pl
Multi-site log statistics / NawStats comes with a batch tool: Tools / AWStats_updateAll.pl, which can be used in volume of all profiles and run statistics in a directory. Therefore, the rest of the work is mainly the synchronization problem of the log.
For multiple sites, many configuration options are repeated. If each profile modifies maintenance, it will be very troublesome, and awStats starts from 5.4 to provide configuration files, so we can configure a common configuration, such as: chedong.common .conf
The configuration settings of other sites can then be overridden and default confinement through the following options. AWSTATS.BBS.CHEDONG.CONF include "chedong.common.conf" logfile "/ path / to / bbs_log" sitename "bbs.chedong.com"
AWSTATS.www.chedong.confinclude "Chedong.common.conf" Logfile "/ path / to / www_log" Sitename "www.chedong.com" hostaliases = "chedong.com" statistical indicator instructions: According to the visitors, no repetition IP statistics, one IP represents a visitors; visitors: One visitors may visit more than 1 day (such as once, once in the afternoon), so according to time (for example: 1 hour), not repeated IP count statistics, visitors' visits; number of pages: not included with pure pages of pictures, CSS, JavaScript files, but if a page uses multiple frames, each frame counts a page request; file number: The file request from the browser client, including pictures, CSS, JavaScript, etc., the user requests a page, if the page contains pictures, so a multi-file request will be issued, the number of files is generally larger than the file number; word Section: Pass the total traffic of the client; data from the Referr: Referr field in the log, record the address before accessing the corresponding page, so if the user is through the search engine search results Click to enter the website, log There will be the user's query address in the corresponding search engine, this address can be extracted by parsing the keywords used by the user query: for example: 2003-03-26 15:43:58 123.123.123.123 - Get /index.html 200 192 HTTP / 1.1 Mozilla / 4.0 (Compatible; MSIE 5.01; Windows NT 5.0) http://www.google.com/search?q=chedongawstats in the key phrase and keyword statistics of the search engine The function is still complete: you can identify more than 300 machine crawles around the world, and you can identify most of the mainstream international search engines and local language search engines in many regions.
Hacking awStatsiis Press GMT Time Patch: AWSTATS.PLIIS log Time is between Greenwich, China's local time and GMT have 8 hours gaps, if directly use the Timezone plugin from Greenwich Time Conversion will have 40% performance Drop, there is a patch of time to modify the time coordinates at the time of local time: 7696d7695
153C144 <"Baidu / .com", "Word =", "Search / .sina / .com", "Word =", "Search / .sohu / .com", "Word =", ---> "BAIDU /.com", "word=", "sina / .com", "word =", "3721 / .com", "name =", "163 / .com", "q =", "Tom /. COM "," word = "," sohu / .com "," word = ",
250C234 <"baidu / .com", "baidu", "search / .sina / .com", "sina", "search / .sohu / .com", "sohu", ---> "baidu / .com "," Baidu "," sina / .com "," 3721 "," 163/00/00 "," Netease "," Tom / .com "," Tom ", "sohu / .com", "sohu", the google's Unicode query still requires some query patches: because Google's IE-free query for Windows 2000 or more is UTF-8 format, while other search engines Part of the system local encoding: GB2312, there is a need to query the URI decoding, and according to whether the UTF-8 is used to the GB2312 transcodation, the same word will remain in the statistical UTF-8 and GB2312 two. recording.
I added the following functions for UTF-8 characters decoding and similar to "/ XC4 / XBE / XD7 / XD3 / XC3 / XC0" querying SUB UTF8_TO_ASCII {MY $ String = Shift; My $ Encoding = Shift;
# CHANGE / XC4 / XBE / XD7 / XD3 / XC3 / XC0 INTO% C4% BE% D7% D3% C3% C0 $ String = ~ s /// x (/ w {2}) /% / 1 / gi;
# URI Unescape $ String = URI_UNESCAPE ($ String);
IF ($ String = ~ m / ^ ([/ x00- / x7f] | [/ xc2- / xdf] [/ x80- / xbf] | / XE0 [/ xa0- / xbf] [/ x80- / xbf] | [/ x80- / xbf] [/ x80- / xbf] | / xf0 [/ x90- / xbf] [/ x80- / xbf] [/ x80- / xbf] | [/ xf1- / xf7] [/ x80- / xbf] [/ x80- / xbf]) * $ /) {$ string = decode ("UTF-8", $ String); $ String = Encode $ encoding, $ string;
# Trim Space $ string = ~ s / ^ / s //; $ String = ~ S // S $ //;
# REVERSE " ", ";" to space $ string = ~ s /; // g; $ String = ~ s // s // / g;
#print $ string. "/ n"; return $ String;}
Here is more about Google UTF-8 query patches.
Plug-in installation based on geographic information:
GEOIP and GEO :: ipfree (awStats 5.5 ) / NGEOIP and Geo :: ipfree are free of the country / IP shot table, which is more accurate than the domain name by DNS reverse, and fast. GEOIP's API is free, the default library is free, and it is its data update service. GEO :: ipfree not only the code is open, and the library data is also open, so you can customize it yourself, such as: China's city to IP mapping.
Installation: Download geoIP unpackage ./configure;make#make install
Download Geo :: ipfreeperl makefilemake # make install configuration: Enable Plug-in GeoIP or Geo :: ipfree by enabled in the configuration file
Reference:
AWSTATSHTTP: / /AWSTATS.SOURCE.NET/
WebalizerHttp://www.webalizer.org/
Log Analysis Tool http://directory.google.com/top/computers/software/internet/site_management/log_analysis/
Business log statistics / analysis tool http://directory.google.com/top/computers/software/internet/site_management/log_analysis/commercial/
Multi-site log merge statistics: http://www.chedong.com/tech/rotate_merge_log.html
Log statistics on the analysis of the search engine's impact on the site is very important http://www.chedong.com/tech/google.html
The awStats itself also includes a lot of plugins, including summarizing the statistics of multiple sites, IIS log Time conversion, URL title mapping, etc. http://awstats.sourceforge.net/awstats_contrib.html