First, introduction
China Internet Network Information Center (CNNIC) is a non-profit management and service agency established on June 3, 1997, exercising the responsibility of the National Internet Information Center. Its aim is to serve my country's Internet users and promote the health of my country's Internet and develop in an orderly manner. With the rapid development of the Internet, the majority of Internet sites urgently need to understand their website visits, so they use some domestic or foreign services to measure and measure the visits of the website. However, these services face an important challenge, that is, lack of authoritative definitions and metrics for accessing statistical indicators, lack of official standards lack facts. Each service provider provides statistical indicators for different statistical calibers. For business considerations, service providers often do not disclose their statistical metrics. For websites, due to the use of different websites to access the statistics, they have not been compared to access statistical reports in other websites. This report lacks the attraction of advertisers, on the one hand, restricts the profitable space of the website, and on the other hand, the development of the Internet is also restricted. For advertisers, they are also confused, because they judge which website to broadcast their ads will receive a better effect, their advertising investment should be proportional to website access, and comparable Website access statistics report is the basis for investment.
China Internet Information Center (CNNIC) recommended website access statistical terms and metrics are desirable to be able to propose a standard for visiting statistics that can be accessed widely accepted. Our task is to establish a set of websites to access statistics and make recommendations for its metrics. We hope to use this document as a guidelines for website access statistics to help our website builders, site visits, and website advertisers to obtain information they want to understand, and accurately plan, implement, and implement them. Their online business project provides a basis.
This document provides the interpretation of the website access statistical terms and suggestions for metrics, which will help the website to use a common language to issue access statistics to the outside world.
We drafted this proposal to promote the development of Internet business in China. We also hope that this document can cause attention to the Internet industry, so that everyone pays attention to the metric of the website access information. Because we sincerely hope that Internet sites can become a more friendly media platform for advertisers, so that the Internet website can go to the road that sustainable development.
Second, statistical implementation
Statistics on the access information of the website, we recommend the following implementation:
This approach is to analyze the log files generated by the web server. This log file is sometimes original files, sometimes generated by a third-party statistic agency to be added to the server side. The advantage of this approach is that the log files can be customized, using the technology of encryption algorithms and compressed log files to ensure the authenticity and reliability of the log file, and reduce the network traffic generated by the log file, which is suitable for third parties. The agency is working on the authentication metrics of website access. Of course, this approach has its own shortcomings, including difficult to do real-time statistical analysis, and the additional modules on the server may reduce the performance of the web server.
In the document, when mentioning this method, we call the way the log file is used.
Third, how to identify visitors to the website of the website is the foundation of the website access statistics. Inappropriate to the visitor's identity is the fundamental reason why the reports provided by the various access statistics services are difficult to compare. There is currently no perfect way to identify visitors, so a variety of access statistics services use different logo visits methods can be understood. We hope to be able to propose standards with comparable, widely accepted websites to access statistical metrics. Visitor (Visitor) definition: A individual with an interact with the site. Measurement method:
We recommend that the following methods are used as a metric to identify the visitor's method. First use IP addresses to identify visitors, different IP addresses indicate different visits. Attempting to identify visitors by tracking files (cookies), different tracking files indicate different visits. A log file that is generated in the server-side module generated to identify the visitor's tracking file (cookie), which will make up for the lack of tracking files (cookies) of the original log file. Tracking files refer to an HTTP response that is sent to the browser to the browser that supports a browser that supports the tracking file (cookie) will keep a small piece of information to identify your identity on the local hard disk. Different tracking files (cookies) can indicate different visits. Comment: The visitor to identify the website is the basis of the website access statistics. User (USER) and visitors are the same term, and their meaning is the same. 1. Problem in the method of simply use the tracking file (cookie). (1) Not all browsers support the tracking file (cookie). (2) Some of the browser that supports the tracking file (cookie) is allowed to use a policy that does not accept any tracking file (cookie). (3) Tracking files (cookies) can be removed by certain programs or manually. (4) If the user uses a variety of browsers at the same time, each browser saves a different tracking file (cookie). (5) If the user reinstalls the operating system or reinstall the browser, the tracking file (cookie) is likely to be lost unless the user holds them manually. (6) The browser can only save a total of 300 tracking files (cookies), each tracking file (cookie) has 4K capacity restrictions, each domain or server can only place 20 tracking files (cookies) on the client. (7) There is an argument on tracking documents (cookie) violations of visitor privacy.
Tracking files have various disputes, but it is still one of the recommended methods, supports the use of tracking files (cookies) including: (1) The header information containing the set-cookie by the web server will not be Proxy Server (Proxy) Cache, Proxy Server (Proxy) will send SET-Cookie header information to the client browser. Similarly, the header information containing the Cookie's request will also be forwarded to the web server by the proxy server (Proxy). Therefore, the tracking file is a method of simply and efficient identifying a user using a proxy server (Proxy) access network. (2) The most widely used browser Internet Explorer 3.x, 4.x, 5.x, Netscape 3.x, 4.x, and Opera 3.x are supported by 1%. Visitors use other browsers thanby. (3) In the default, the browser uses a policy that accepts all tracking files (cookies). (4) For most friendly websites, tracking files (cookies) provide a mechanism for visiting accessers, rather than a tool for voyeur user access paths. 2. Identifying accessers through IP address is one of a very common manner and is worth recommending. The advantages of using IP address identifying visitors are: (1) Computers, IP addresses that have unique IP addresses directly on the Internet Computers can be accurately identified. (2) Relative tracking files, the IP address tracks the computer, and the tracking file (cookie) tracks the browser. Computers with the same IP address are likely to retain multiple tracking files (cookies) since the simultaneous use of multiple browsers, so IP addresses better identify separate computers. There are some problems in identifying users through IP address. Users who cannot access the network through the proxy server (Proxy) cannot be identified from the web server access log. Although sometimes you can use a proxy server (Proxy) from the HTTP_USER_AGENT environment variable, it is still unable to know which visitors he is. Therefore, we choose the use of IP addresses, tracking files (cookies) to identify visitors.
Fourth, website visual indicators and metrics
Unique Visitor
definition:
The only visitor refers to the first visit to the website in a specific time, with a unique visitor ID (unique address). This specific time is recommended for a whole day.
Measurement method:
In the same day, only the visitor with the only visitor identifier for the first time entry website, and the site is not countful in the same day.
comment:
Daily Unique Visitor is also called Daily Unique Visitor. Independent visitor, independent visitors, independent users, unique users and unique visitors are the same term. The only visitors provide statistical indicators of different viewers in a certain period of time, rather than reacting a comprehensive activity of the website.
Monthly Unique Visitor
definition:
Ibid. Specific time suggestions are all the month.
Measurement method:
Within the same month, only the first visit to the only visitor identifier of the website, and the website will be accessed again in the same month. User Session Definition:
The user session refers to the process of accessing or re-enters the site for accessers with unique visitors (unique addresses).
Measurement method:
Accessors have interactive activities with the website within 20 minutes, they are considered to be the same entry website, not logging new user sessions; when visitors have no interaction with the website, when he visited the website again, visitors were considered Once again, I entered the website and record the new user session.
comment:
User's import, Visit (VISIT) and user sessions are the same term. User sessions should not be interpreted as an accessed number or access to the website, but the user session is an indicator of the relative approach to the website to access the number of people or visits. The exact visit to the website or the number of visits is difficult to be statistically. The user session is more than the only visitors to explain all the activities of the website, indicating the frequency of use of the website.
Page View
definition:
One page reading is a page download, visitors successfully read the page to see this page completely on his browser.
Measurement method:
A browser request can be calculated for a page reading.
comment:
Calculating as a browser is not completely accurate. 1. Proxy Server (PROXY) Cache (Cache) and browser cache (cache) makes the number of requests recorded by the server than the number of pages actually displayed on the visitor browser. 2, in the case of small bandwidth, long response time, visitors may jump to other pages before the page display, so even if the server records the visitor's request, it is actually not visited. 3, the Splash Page and Void Page (INTERSTIAL) should not be recorded in the page viewing. 4. The dynamic generated page should be recorded in the page viewing. 5. The page containing frames should only be recorded only a page reading, even if a frame containing frames generates a request for multiple documents.
Statistics using the analysis log files, the Splash Page and Void Page (Interstitial) are log file records, which should ignore the computed-specific hoppage and void pages (Interstitial) when analyzing. A request for a specific program (such as a CGI program) is recorded in the log file, so that the page dynamically generated by these programs can also be calculated. The log file recognizes the page that contains the frame, which is statistically used by the analysis log file, which can be accepted.
Page reading, page view, view (View), page impression, page request (PageRequest), page viewing is the same term.
Request
definition:
In order to obtain a resource on the server (which can be text, image, or any element that can be included in the page), a single connection between the browser and the server it connects.
Measurement method:
For statistics using the analysis log file, a record in the log file is a request, and the data is obtained by gaining the statistics of these records.
comment:
Hit (HIT) and request are the same term. When the page request refers to a request for an HTML document, the page request is a subset of requests. When the page requests the visitor page reading, the request and page request has different meaning, in some cases, the request is not recorded Page reading or page request. V. Visitors Feature Indicators and Metrical Browser Definitions:
A program for locating and reading an HTML document (for example: Netscape Communicator, Mosaic, Microsoft Internet Explorer. Measurement method:
You can get information about the browser type from the log file to get the statistics. comment:
You can usually get information about the name of the software vendor, the version of the browser, etc. But browserstring has no standard format, which is a difficulty of analyzing it. Platform definition:
Access to the site visitor to use the operating platform. Measurement method:
Like the analysis browser, you can analyze the browser string (Browser String) to obtain information about the operation platform. comment:
Considering a special browser such as WebTV and SEGA, it is called a more appropriate way to operate the operation platform than the operating system. They can be identified by accompanying the URL request. Browser language (Browser Language) definition:
The language used in the browser. Measurement method:
You can get the language of your browser string (Browser String), and the http_accept_language environment variable can also reflect the language of the HTML document that the browser wants to receive. comment:
Not all browsers can get the language it used. Use the analysis log file cannot get the browser language data. Domain name definition:
The text address of the IP address of the computer on the internet network is a formal name of a computer connected to a computer on the Internet. Measurement method:
The measurement domain name is actually a level or secondary domain in which the remote computer is located, such as: .com, .edu, .cn, .com.cn, .net.cn, etc. The Remote_host environment variables and log files record the host names and domain names of the remote computer, but not all cases can get the host name and domain name of the remote computer. comment:
Not all computer-connected computers can record their hostnames and domain names, most of the computer is still IP addresses instead of their hostnames and domain names, and statistics should be indicated when there is no hostname and domain name. "unknown". Different servers and their configurations affect whether the host name and domain name of the remote computer can be obtained. Remote computers that can be reversed to parse the IP address tend to record their hostnames and domain names, but the reverse parsing of IP addresses when log files will increase the load, especially for websites. The reverse resolution of the IP address can be performed when the log file is analyzed, and of course this will slow down the speed of analysis. Guidelrer, Referral Link defines:
Visitors Click on the link in a page to boot to the current HTML page, the link is the directive link of the current page. Measurement method:
Information from the HTTP_REFERER environment variable and the server log file can be obtained.
comment:
Sometimes the term "referring page) is also encountered, and their sense is similar, and in the browser, it always reaches the URL of the target.
Sixth, visitors behavior indicators and metrics
Average Time Per Page Request Definition: Averactage Time Per Page Request:
Visitors average time a plurality of pages request each time.
Measurement method:
The first request of the user session to the last request time (the number of page requests during the user session).
comment:
The average time per page request should be obtained in a relatively large range, and this value should have been calculated before calculating the user session.
User session longens
definition:
The length of time for a user session.
Measurement method:
The first request of the user session to the last request time the average time per page request.
comment:
User Access Time and the length of the user session is the same term.
Average User Session Duration (Average User Session Length)
definition:
The average time length of the website visitor user session. Measurement method:
Total user session time ÷ user session number.
comment:
Average user access duration and average user session time is the same term.
Return to access (Return Visits)
definition:
In a specific period of time, the visitor accesses the number of times in different user sessions.
Measurement method:
The metric has visited the number of websites in different user sessions within a specific time.
comment:
This specific time can be determined by a mechanism for statistics. The recommended time can be one day or not to set this particular time, the latter can indicate that the visitors have access to the number of times of the site. The number of returns to access indicates the level of popularity of the website.
Seven, other measurable indicators
Bandwidth
definition:
Website traffic metrics (units in data transfer).
Measurement method:
Statistics using the analysis log files can be used to return the bandwidth of the website based on the size of the file in each record in the log file.
Reloading (RELOAD)
definition:
The visitor clicks the overload button in the browser or refresh button to reload the action of the current page.
Measurement method:
According to the way to analyze the access log file, the request to the page will be reused when the visitor executes the overload operation, and the same request within 30 seconds can be determined that the accesser performs the overloaded operation, and records the number of heavy loads. .
comment:
The number of overloaded operations cannot be determined fully accurately. We recommend the number of page views and heavy-duty number, without having to subtract heavy load numbers from the number of page views. Welcome and visitors' loyalty to the website. Click (click)
definition:
One click refers to the visitor's mouse on a hypertext link, the purpose is to get more accessor interested in order along its link.
Measurement method:
Only by using the analysis log file can count the number of clicks on a hypertext link.
comment:
Click-through, Clickthrough and click is the same term. Click to use statistics for online advertising.
Click Rate
definition:
Click on the percentage of the link.
Measurement method:
Clicks All the number of requests where the link is located.
comment:
Yield and click rate are the same term. Hits the value of many aspects, in the online advertisement, it is the performance of advertising validity, which means that visitors have reached the advertiser, and these websites can also provide additional information.
Advertising request (Ad Request)
Definition: Refers to the request for the advertising element in the page.
Measurement method:
Measure method of the advertising request Reference to the metric method for page reading.
Eight, discussion (FAQ)
Q.
Is there any other way to realize statistics?
A.
Another way is to embed a statistical code on a page that is desired to make statistics. This code references resources on another server. This resource is usually generated by a CGI program (or other similar programs). When the visitor accesses the page, the server where this CGI program (or other similar program) is located, so that the information accessed the page and the information of the visitor will be recorded by the CGI program. This approach is easy to achieve real-time statistical analysis, and statistics are relatively rich and will not increase the load of the web server. But this way is easy to be deceived, and it is easy to cause statistical information collection due to bandwidth or the like. In this way, there is an unsafe factor due to susceptibility, and it may be a better statistical implementation after security issues. From the perspective of ease of use, the extension of this statistical method is valuable.
Q.
Why is the user session time to be 20 minutes?
A.
We refer to the use of the Internet of International Interconnected Internet References on the user session time, and found that the main used time period is 30 minutes and 20 minutes. This time will affect the metric of the number of user sessions. If the time period is closer to the average residence time on the website, the number of user sessions will be closer to the number of websites of the website. CNNIC's statistics on some of the domestic websites indicate that users stay in the information on the website of information. We believe that the current user session time is set to 20 minutes. We will adjust this time to adapt to the development of domestic internet networks.
Q.
Advertisers want to know the number of their ads to be exactly visited by visitors, rather than just know that visitors have issued a request. What indicators can I answer advertisers?
A.
We understand that advertisers want to know the requirements of their advertisements actually see, but in fact, such data cannot be fully accurately accurately. As with other media, advertisers pay for potential views (such as the number of prints). The exact data we can get is only asked by the visitor. In this document we recommend counting the "request" level instead of "delivery", because the website successfully delivers the content to the user is determined by many factors, including the network condition and user behavior preferences, etc. Therefore, it is difficult to be accurately statistically. You can use the advertising request to approximate the number of advertisements that visitors can see.
Q.
Our website wants to understand which provinces, municipalities, and autonomous regions visitors, but there seems to be no statistical indicators in this area?
A.
Although the location of the visitor is a very valuable information, it is very difficult to list which geographic area visiting which geographic area is very difficult. It is unreliable by the IP address to determine the source of the regional division. There is currently no approximate indicators to indicate the location of the visitor.
Q.
Page reading and page requests seem to be different terms, why do this document think they are the same term?
A.
The page reading the term focus on the page of the measurement visitor, the page request focuses on the number of requests initiated by the visitor, even if the last possible visitors do not really read the page. It is believed that they are two reasons why they have mentioned that we have mentioned that we recommend statistics from the "request" level rather than "delivery", so the metric method of these two words is consistent, and the other is us. I hope that this document can simplify too complicated terms, and the quantity of the terminology is reduced and has a unified explanation. However, when it comes to the request to the HTML document received by the server, you can still use the page request.
Q.
I saw that there was a report on the newspaper that "there are 700,000 people on a website for two months". What does this mean? A.
This is inaccurate, because precise access is unable to measure at the current level of technology, interpreting the number of users as an accessed person. If the number of users' sessions of the website is 700,000, it should be said that "a certain website user session reaches 700,000 in two months" rather than "a website home page visit to reach 700,000 people." .
Q.
The definition and metrics of these terms are implemented in our existing system?
A.
This will not be a big problem for most websites. Because when drafting this document, we refer to services and software tools for existing statistics and metric websites at home and abroad, in fact they are substantially in use of these terms and metrics. But the statistics and metrics of website visits are still a field lack of standards. One of the original intentions we drafted this document hope that this area can become chapter.
Nine, other terms
Browser Caching (Browser Caching)
definition:
In order to accelerate browsing, the browser stores the most recently requested document on the user disk. When the visitor requests this page again, the browser can display the document from the local disk so that the page can be accelerated. However, the web server may therefore have not calculated a page or an advertisement has been viewed.
Proxy Caching (Proxy Caching)
definition:
The storage of the downloaded page is stored by the proxy server. The proxy server is a container for files that frequently requested frequently on the Internet, so some visits can download the same object and use fewer bandwidth. However, the web server may not calculate a page or an advertisement has been viewed. Comment:
The browser cache and proxy server cache are the most difficult solution to the website access statistics, but the cache saves the network's resources and improves the efficiency of the network.
Server definition:
A computer that provides a service to all visits, sometimes referring to the server program.
Customer (Client)
definition:
Refers to the computer used by the network, sometimes refers to a program that is used to contact and obtain data from the server program, ready to be a client, and server programs are typically on another computer.
Track file (cookie)
definition:
A permanent client's HTTP tracking file (cookie) is a file that contains accessient information (eg, user name) that is accessible to the website. This information is provided by the website at the visitor in the first visit. The server records the information in a text file and stores the file on the visitor's hard drive. When the visitor accesses the same website again, the server gets the content in this tracking file (cookie), and provides the visitor to the visitor according to the contents, or identifying the identity of the visitor.
Log file
definition:
The web server or proxy server creates files that contain all information about accessing activities on the server.
Page (PAGES)
definition:
All websites are a collection of electronic pages. Each web page is an HTML (hypertext tag language) document containing text, images, or media objects. A page can be static or dynamically generated.
Visiting page (Splash)
definition:
The eye-catching page refers to a basic page before the website home page, usually highlights the characteristics of the website or advertising. The eye-catching page may move to the main page after a short time.
Void page (Interstitial)
definition:
The gap page is a page that is inserted in the normal delivery between the visitor and the website. The gap page is delivered to the visitor, but it is actually not approved by the visitor. Return code (Return Code)
definition:
The server requests the browser request to return, indicating whether the transfer is successful and cause.
Web site, Site
definition:
On the Internet, you can view the HTML documentation of the HTML document through the browser, the site host is on the server.
Unified Resource Locator (URL)
definition:
The unified resource locator is a method of determining a precise location on the internet network. Such as: http://www.cnnic.net.cn/cnnic/reg/domain/domainapp.html is a URL. As shown in the example, a URL consists of four parts: protocol type (http: //), machine name (www.cnnic.net.cn), directory path (/ cnnic / reg / domain /), and file name ( DomainApp.html).
World Wide Web, WWW, W3, THE Web)
definition:
The World Wide Web is a hypertext-based, distributed computer system, and the World Wide Web has been developed to provide an internet user with a convenient access information.
Advertising (AD)
definition:
Anything that acts as a business tool to transfer messages or attracts the user. Typically take the form or text message of the picture, but can also be any HTML document element, such as a Java Applet or ShockWave program running as needed.
Banner Advertising (Banner)
definition:
On the webpage, you usually link the ad image for the advertiser site. Banner ad is the main form of online advertising. Standard banner advertising size: 1,468 x 60 pixels 2,392 x 60 pixels 3,234 x 60 pixels 4,120 x 240 pixels 5,120 x 90 pixels 6,120 x 60 pixels 7, 125 x 125 pixels 8 88x 31 pixels
Cost per thousand page (CPM)
definition:
The expense of the advertisement is 1000 times.
comment:
This measure is borrowed from a print advertisement. Since not all page views eventually see advertisements (for example, roll a page). The cost per thousand page view is often interpreted as a thousand ads. M represents a thousand of Roman numbers. This is a standard pricing model for website advertisements being formed.