Two types of information for search engine maintenance:
Web related information (accessible through information acquisition part); user behavior information (acquired by log record)
Traditional IR technology: (Information Retrieval)
Vector space model for document
TF * IDF algorithm
--- Use the web information itself and the user behavior information supplement
--- Analysis web page with a direction map formed by super-connected
Google random surf model PageRank technology sort
IBM Clever Authoritative & Directory Page HITS Calculation Weight
Tianwanese LHN (LINK HIT NUMBER) calculation weight
User experience is small, but the number is huge - using Direct technology to track the subsequent behavior of the search results
Gray Cullis Search Engine Information Category:
Web name information; link information; artificial catalog information; user behavior information.
Basic user behavior characteristics:
Survey word distribution statistics
Attenuation of Lei Tong Query
Deviation analysis of adjacent N item query words
Turn page statistics
User Click on the distribution of URLs
Web pages; mirroring; domain name