Keywords: Google PageRank "Link Popular" "Website Promotion" "Optimization for search Engine"
Summary: The current impact of the Chinese website is relatively small in the entire Internet, mainly due to the overall level of the Chinese website (technical, content), which is relatively backward, the most important performance:
Industry knowledge: I don't know the importance of search engines to attract new users, pursue "fool-related" in search engine rankings, buy some industrial keywords that do not have much actual meaning. In fact, the more keywords that can be entered, the stronger the trueity, the higher the value. Users can directly locate the product specific content page is more valuable to the homepage of the website; publish technology: The website's web page enters Google's index is very small, the dynamic web page is still the main release mechanism, lacks the dynamic page link to behave as a static page Mechanism; page design: page title duplicate, keywords are not prominent, transition uses JavaScript scripting / image / flash, etc. Not suitable for search engine index; unable to quantify things are unmanageable, the above fundamental reason is often the website itself Lack of log statistics analysis:
In fact, most websites only have a simple strategy to have a real manifestation in the Internet, and the website structure design has been used to optimize the search engine, including:
The importance of link references; how to highlight keywords: web title, theme design; page and site structure design precautions; and the importance of site access statistics, etc .; Google's site design guide (Note: This website design itself uses Some of these methods). What is one of the advantages of PageRankgoogle and other new search engines, not only the amount of index is large, but also the best result is the most in front of the search results. Specific principles can refer to Google's secrets - PageRank is completely explained, PageRank is simple It is similar to the reference mechanism similar to the scientific paper: Who is more quoted, who is authority. PageRank on the Internet is based on the analysis of mutual link relationships in the web page.
In addition, from the perspective of the calculation method, there is also this article: http://pr.efactory.de/ There is a more detailed PageRank algorithm description and a variety of clear small cases:
For example: the importance of the navigation bar in the sub page B <=> a => C vs B <> a <=> C (good)
Page number factor: b <=> a <=> CVSF <= / / => GB <=> a <=> c (good) D <= / / => e
An unexpected conclusion: (B <=> a <=> C) (E <=> D <=> F) VS (B <=> A <=> C) <=> (E <=> D < => F) PageRank elevated is just 2 first pages A and D of the link, while the PageRank of the website page will decline. At the same time: The larger the index of Google enters Google, the smaller it is affected by similar factors. PageRank asymmetric page Interpretation: Google will correctly use the algorithm such as Badrank, and a web page has a link from the high PageRank site from "Itself", and it is lacking enough and quality inverse link. If the PageRank will automatically reduce to 0, A (Pr = 7) <=> B (Pr = 0) is simple to say: occasionally to be inverted by the authoritative site inverted link, it is necessary to be quoted by a sufficient authority site Improve the PageRank from your own page. Web hyperlink analysis algorithm review (Nanjing University Paper) More Papers can be Search: "FileType: PDF Google PageRank Anchor Text Bayesian" Get a link is everything in the ocean of the Internet, the most important thing is to interconnect, not by other websites The website is "Information Island". "Wine is also afraid of the alley deep", maybe this is a bit like spam advertisements, but the fact is like this. So if the purpose of doing a website is not a lonely, you need to actively promote your own website. Promote yourself through search engines, you need to pay attention to the following aspects:
With a winning catalog: not necessarily joining a large website is the website promotion, any inverse link from other websites is a useful website to promote the classic way is to join the classified catalog of large-scale portals, such as: Yahoo!, DMOZ.ORG, etc. In fact, there is a misunderstanding here: not necessarily to join a large-scale website classification directory is the website promotion, because the search engine is no longer just the index of the website directory, but a more comprehensive web index, so anywhere from other websites The inverting link is very valuable, even if it appears in the news report, the forum, the mailing list archive. Blogger (referred to as WebLog) may deepen the meaning of "link is everything", because there is a large number of mutual links between Blog, so the most referenced BLOG page is ranked in search engines It is often higher than the page of some large commercial websites. The document management system such as Wiki has highlighted the characteristics of good references. Winning: The number of PageRank's website can be rapidly increased to one of the key factors. The link from PageRank high can improve the PageRank that is linked to the target, I just submit some articles. ZDNET China, because there is a link on the page, the corresponding web page and the whole page of the website have been greatly improved after a period of time. Sometimes what kind of website is quoted sometimes more important than the number of references. What I want here is that ZDNET China is the only requirement that follows my copyright statement indicates the source of the article and has a reverse link. According to this principle: the first 2 layers of the large authority directory such as Yahoo! and DMOZ are very valuable. Understand the "Values" of the search engine: Lucene Introduction This article is referenced by the Lucene project of Jakarta.apache.org, this article has become the highest page of PageRank in all pages, and some of the projects supported by Google, such as: folding @ HOME, I have always doubtful to the site of the government, education and non-profit organizations, after all .org .edu represents the real thing of the Internet: decentralization and sharing. But a more reasonable explanation is: .org a lot of open technology platform developers, they will join Powered by Apache in the first page, Power by FreeBSD, represent respect for other open source platforms, so like Apache, Open source sites like PHP, FreeBSD have very high PageRank in Google. In the .edu site, many are all academic comparison documents, which indicate that the source of references in the hyperlink has become a habit, and this is undoubtedly the best basis for PageRank. Note: Do not enhance its own site rankings through Link Farm: Google will punish those pages that take the initiative to link to the Link Farm to improve their own rankings, and the corresponding site will not be incorporated into the index. But if your page is linked by the Link Farm link, you don't have to worry because this passive link is not punished.
Don't give links to other websites: If a web page has only a large number of enters link, the lack of export links will also be considered a site that is not worthless. Ensure that your website can help search engines more accurately judgment which is the most valuable information for users, that is, if your site has only external reverse links without exporting links, it will also be in search results. The performance brings a negative impact. Of course, there is no situation in the website, unless you deliberately do it. Under normal circumstances, everyone will naturally add some links to some other websites, led visitors to what we think is important or able to provide more valuable information, and may first need to know your website before promoting your website. In some search engines, the principle is very simple, you can refer to how to evaluate the popularity of the website. Website promotion is only means, how to promine the content, so that users who need related information can find your website as soon as possible, PageRank high does not represent portions like Yahoo!, they can rank forward in all search results, because The result of the search engine is the ranking results of the search keyword and the page of PageRank in the page. So the second point: how to highlight keywords.
How to highlight keywords: Theme "Keyword matching Theme Engine is gradually exceeding PR, becoming a more important factor in ordering, can compare the following phenomenon: Why check" news "," car ", etc. The homepage of various portals is not in the first place? To know that there is a news page of the link text of the corresponding channel, the search engine does not match the template, automatically use the heads, column navigation bars, page leads, etc. Are the distinguishment? In fact, the above problems can be summed up for the extraction strategy and keywords in the web content summary: First of all, it will be able to describe a few types of content in the content:
Linking into text description: Inbound link textttp://www.search "Dictionary.com/terms-inbound-link.shtml
HTML page Title: Title Good title generally places the most important keywords in the page, such as: ABC-10 vacuum cleaner - XX Home Appliances HTML content main body: Content Text Chain text: Outbound Link Text
If you follow the following rules: One page is keyword hits weight: link into text> HTML Title text> HTML page main content >> Binal text, the above is better explained. Linking into the text is not seen on the page, but the link into the text has a huge role in the link page: in the process of modern search engines, the matching process does not only look at the content summary of the current page: Very extent I don't just look at this page I have to say what I have, and I have to see how others link, how to describe how others call you, tell yourself more important than yourself. . For example: "World Health Organization", the return result is http://www.who.int/ and this page is not Chinese, the reasonable match is because many links have been used in the Chinese website: < A href = "http://www.who.int/"> World Health Organization , so the Chinese keywords in this page have also become part of the page summary. In this way, you can know the text of the link link is actually serving the link to the sub-channel home page or content details page. The keyword density of its own page is only negative, which is also the link to the link in the link in a page: he does not index 100 links at all. According to the above rules, the search engine extracts the news content in a news details page is to remove all the words with HTTP links on the page. It is part of the content of the news, more accurate some of the longest text paragraphs and other strategies. Realize; and many websites are all links in the homepage or channel home page. The results of search engine analysis are: What is not, the keywords that can be hit are only the "Home" and channel Title of others link you. Several keywords such as site names, while other words in the page are far less than the matching degree of corresponding sub-channels and concrete content pages, and search engines can pass the above rules, allowing users to directly locate them. Conformant content details page. So I hope that all the keywords that wish to promote as many hits as much as possible are impossible. Let the webpage enter the search engine index, then grasp the topic style of the entire website is very important, let the website theme keywords can be more uniform in accordance with the pyramid pattern to the website can be referred to: Website theme pyramid design Website Name (users pass 1-2 abstract keywords) / / sub-channel 1 sub-channel 2 (users pass 2-3 keywords in the hit) / / / product 1 Product 2 Articles 1 Articles 2 (users pass 3-4 Key words hit: this user's most valuable)
Don't be empty: empty
And even if the same content, the static web page will also be higher than the dynamic web page weight: it is easy to understand: query? A = 1 & b = 2 This link is exactly the same because of the query? B = 2 & a = 1 of the parameter order. Try to use static web pages: It is still more difficult to perform a full-scale index, and even if Google will not index all content, there is little to further capture and analyze more than 2 dynamic web pages. The following is an HTTP header information returned by a PHPBB forum page: http / 1.1 200 okdate: WED, 28 JAN 2004 12:58:54 Gmtserver: Apache / 1.3.29 (UNIX) mod_gzip / 1.3.26.1a php / 4.3.4x- Powered-by: php / 4.3.4set-cookie: phpbb_data = a% 3A0% 3A% 7B% 7D; Expires = THU, 27-JAN-2005 12:58:54 gmt; Path = /; set-cookie: phpbb_sid = 09f67a83ee108ecbf11e35bb6f36fcec; path = /; Content-Encoding: gzipCache-Control: private, pre-check = 0, post-check = 0, max-age = 0Expires: 0Pragma: no-cacheConnection: closeContent-Type: text / html in order to avoid privacy Question: GoogleBot can make some screening of the session ID and Session cookies in the page HTTP header, so many forums that need to authenticate information cannot enter the index. In general, Google likes new, static content. Therefore, regardless of the efficiency or convenient search engine, it is very necessary to use the content publishing system to release the content of the website into a static web page, and some extent, Google Friendly = Anonymous Cache Friendly. For example: http://www.chedong.com/phpman.php/man/intro/3 ratio http://www.chedong.com/phpman.php?mode=man¶meter=IntroSection=3 This link is easier to enter your search The index of the engine. And the hit in the URL can also highlight keywords. The more the number of pages that can enter the Google index is, the better. Use scripts like the following to count your own website to be indexed by search engine. #! / bin / shyesterday = `date -d yesterday % y% m% d` # for freebsd: yesterday =` Date -V-1D % Y% M% D`
LOG_FILE = '/ home / apache / logs / access_log'
grep -i Googlebot $ LOG_FILE $ YESTERDAY |. awk '{print $ 7}' | sort | uniq -c | sort -rn> spider / $ YESTERDAY.googlebot.txtgrep -i baiduspider $ LOG_FILE $ YESTERDAY |. awk '{print $ 7 } '| Sort | UNIQ-C | sort -rn> spider / $ yesterday.baiduspider.txtgrep -i msnbot $ log_file. $ yesterday | awk' {print $ 7} '| sort | uniq -c | sort -rn> spider / $ Yesterday.msnbot.txtgrep -i ingtomi $ log_file. $ Yesterday | awk '{print $ 7}' | Sort | UNIQ-C | sort -rn> spider / $ yesterday.inktomi.txtgrep -i openbot $ log_file. $ Yesterday | AWK '{Print $ 7}' | Sort | UNIQ-C | Sort -rn> Spider / $ YesterDay.Openbot.txt website directory structure is flat, because each depth-level directory, PageRank reduces 1-2 grades. Suppose the home page is 3, and its son may be a directory is 1, and it may not be included in the rating.
Separation of performance and content: "Green" web page JavaScript and CSS as much as possible with web pages, improve code rejection (also facilitate page cache), and on the other hand, due to effective content accounting for the length of the web page, It can improve the proportion of related keywords in the page. In summary, it should be encouraged to follow the W3C specification, using a more standardized XHTML and XML as the display format for longer saving. Let all pages have quick entry: Site Map, convenient web climb (spider) Quick traversal site all the content you need to post. If the homepage is imported with flash or picture, it is also very important to use the user-friendly user-friendly user-friendly, in addition to UI design. Keep the website itself: often use the bad chain inspection tool to check if there is a dead chain in the website. Keep web content / linkage stability and persistence: The history of the webpage in the search engine index is also a relatively important factor, and the higher the chance of the network is linked. To ensure that your web page can be more durable by other websites, if you have link updates in your own web page, you can still keep your old pages and make a link steering to maintain the continuity of content. To know, the ranking of a website and content in the search engine is very uncomfortable, noble, no one wants to find it is found by others, but click "404" no Presented ", the site administrator is also very necessary to analyze the Error.log of its own site. File type factor: Google has the index capability of PDF, Word (Power Point, Excel), PS document, because the content of this document has been more organized than the general HTML, and academic value is generally higher, so these types Documents are born to be high than the general HTML type document PageRank. Therefore, for more important documents: technical white paper, FAQ, installation documentation, etc., it is recommended to use advanced format access to PDF PS, which can also obtain a relatively brought position in the search results. One news that often discovers that the portal site is often ranked before the homepage of other sites. Therefore, a site overall PageRank has improved, and it is often used by some of themselves to be brought into the list of search engine priority queries with those high PageRank's content. This often causes a lot of mail list archives that large development sites that are often higher than the home page of other sites. Ideologically known to the site access statistics / log analysis Mining importance website design is not only passive catering the search engine index, but more importantly, it is to make full use of the traffic brought by the search engine to make a deeper user behavior analysis. At present, the keyword statistics from search engine are almost standard features of various web log analysis tools. I believe that commercial log statistics should have a more strengthening implementation in this regard. Web log statistics This function is so important that the new RedHat 8 has been used as one of the standard server configuration applications. Take apache / webalizer as an example, the specific practice is as follows: Record access Source: Set the log format in the Apache configuration file to Combined format, this log will contain extended information: One of which is the source of the corresponding access: http_referer, If the user finds your webpage from a search engine search results and click, http_referer recorded in the log is the URL of the user in the search engine result page, which contains the keywords for the user query.
In Webalizer, default configuration statistics for search engines: How to extract keywords in http_referer Webalizer defaults for Query format for Yahoo, Google and other international popular search engines: Here I added search engine parameter settings for domestic portal sites SearchEngine yahoo.com p = SearchEngine altavista.com q = SearchEngine google.com q = SearchEngine sina.com.cn word = SearchEngine baidu.com word = SearchEngine sohu.com word = SearchEngine 163.com q = webalizer arranged such statistics through Extract the keyword from the URL from the search engine from the search engine, such as: All the values of parameter Q will be statistics as keywords: From the summary plan, you can find users Yes, what is the number of times, and finding your users is most interesting is those keywords, further, there is settings in Webalizer to pour the statistics into the CSV format, which is convenient for future import database Make historical statistics, do more deeper data mining, etc.
Previously, users who pass the web log are mainly simple to log access time / IP address sources in logs. It is obvious that the results of the statistics that are based on the statistics of the search engine keyword are richer and more intuitive. Therefore, the potential business value of search engine services is almost unparatorared. Maybe this is also the traditional search engine website such as Yahoo! Altavista after the door mode, and the reasons for the search engine market will be restarted. Look at Google's annual keyword statistics Knowing, who is more interested in making users more interested in the Internet?
Please see the reverse link of this site: http://www.chedong.com/log/2003_6.log Need to note: Since Google uses the UTF-8 mode encoding for IE in Windows 2000, there are many Statistics sometimes need to view in the UTF-8 mode is the correct character display. From the statistics, you can feel that Google has become the most common search engine in the IT developers with high use levels. Use Baidu users have also greatly exceeded the traditional Sohu, Sina and other portals, so the advantages of traditional portals in search engines will be very fragile. From the development trend of technology, there will be more service models that use Internet media to do deeper data mining:
Reprinted from cnblog.org - "sudden" text may reveal social trend
In the "New Scientist) Online Magazine, a new research result of Cornell University, attracting attention, perhaps related to Google's motivation to acquire Pyra.
This university's computer scientist Jon Klenberg developed a computer algorithm that recognizes "burst" growth in some text in an article, and he found that these "burst" growth can be used to quickly identify the latest trend And hotspot issues, it is possible to filter important information more efficiently. In the past, many search technologies have adopted a simplicity of the frequency of the text / phrase, but the rate of the use of the use of the text is ignored.
Jon specifically pointed out that this method can be applied to a large number of WebLog to track social trends, which is also very potential for business applications. For example, advertisers can quickly find potential demand fashion from thousands of personal blogs. And as long as the BLOG coverage topics are large enough (this technology is actually the same), this technology will also have practical significance to political, social, cultural and economic.
Although Google News's internal algorithm has not been open, people speculate that this headline news that is completely collected by the machine should not be a pigeon algorithm in the Google search engine, which is likely to be related to this "burst" judgment algorithm. In this way, Google acquired Blog tool suppliers did have a deeper vision. - Newscientist.com news, Word 'Bursts' May Reveal Online Trends- has not been written yet, and many discussions about this discussion have been seen on Slashdot.
Attached: Google Official Site Design Guide
Make A Site with a Clear Hierarchy and text Links. Every Page STATIC TEXT LINK. Let the website have a clear structure and text link, all of the pages must have a static text link entrance to comment: Don't use it Pictures and JavaScript Offer a Site Map To your users with links there.. If The Important Parts of Your Site. If The Site Map is Larger Than 100 Or So Links, You May Want To Break The Site Map Into Separate Pages. Site Map: An important part of the turn to the website. If the site map page exceeds 100 links, you need to divide the page into multiple pages. Impat: Index page Do not exceed 100 links: Spider only consider 100 heads in the header Create a Useful, Information-Rich Site and write pages That Clearly and AccuRely Describe your content. Use some useful, information volume, clear And correctly describe your information. Think About The Words Users Would Type To Find Your Pages, And make Sure That Your Site actually incrudes Those Words WITHIN IT. Imagine users may use your keywords, and ensure that these keywords appear in the website. Impat: Use less adjectives such as "biggest", "best", using users the most concerned words, such as: download, singery name, not some abstract nouns. Try To Use Text Instead of Images to Display Important Names, Content, or Links. The Google Crawler Doesn't Recognize Text Contained In Images. Use the text instead of the image display important name, content, and link. Google's robots don't know the text in the picture. Make Sure That Your Title and Alt Tags Are Descriptive and Accurate. Guaranteed: The page's Title and Alt tag correctly describe Check for Broken Links and Correct HTML. Check the bad chain and correct these HTML errors. If you decide to use dynamic pages (ie, the URL contains a '?' Character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small If you plan to use dynamic pages: "?", You must understand: Not all the robots of all search engines can treat dynamic pages like a static page, keeping the dynamic page parameters will be very helpful. Keep The Links ON A Given Page to a Reasonable Number (Fewer Than 100). Let a link in a page than 100.
Note: Use the Lynx-Dump http://www.chedong.com to simulate the page seen from the Robot angle. The last link statistics show:
[1] Google Free Search ______________________________ Google Search (_) Search WWW (_) Search Chedong.com
Update [2] Site Map / Site Map [3] Remark Bo / Guest Book [4] Feedback / FEED Back ... References
Visible Links 1. http://www.google.com/services/free.html 2. http://www.chedong.com/sitemap.html#sitemap 3. http://www.chedong.com/guestbook/ 4. http://www.chedong.com/formmail.htm ... Hidden Links: 50. http://www.chedong.com/bbcWeb/ ...
The purpose of the search engine is to extract the quality of quality in the Internet to the user, any strategy facilitating help users get relative just, quality content is the search engine pursue goals. PageRank is a very good strategy, but not all strategies are based on very complex algorithms. What is the content of "good" from the Internet from the search engine?
First of all: The Internet is approximately 8G web pages and grows at 2m per day. More than 80% of them are dynamic web, while 20% of the static web page is a relatively simple filtering rule. Second: User Friendly is also very important, search engine uses algorithms to help enhance these high-quality websites, including: separation of content and performance through CSS: less JavaScript and frame structure, Spider itself is difficult to crawl These pages: Most of the JavaScript and Frame structures are advertising. The title is clear: no title, repeating the title or title spam (similar to: Game Game Game Game Game Game This title) is filtered or reduced to score the page size: Because the page will cause the user to download slow, many engines only calculate the page size Webpage within 100K. Link reference: not only need link link, but also help users find other more valuable content; file types: PDF and DOC and other professional documents and non-winning websites from EDU, GOV; link to the website: All The factors that the user's invisible factors are ignored. Also: The behavior of the user search itself is also recorded by Google, which may be helpful for the topic of the target website.
Reference:
Website design optimization http://www.google-search-ENGINE-OPTIMIZATION.COM/
About 7 persons optimized by Google rankings, such as "Meta Tag", "Up", Home, etc.
How to evaluate a website http://www.chedong.com/tech/link_pop_check.html
How to improve the list of websites in Google - Search Engine Advertising mode http://www.chedong.com/tech/google_ads.html
How to improve the list of websites in Google - Search Engine website link design http://www.chedong.com/tech/google_url.html
Google continuously improves the corresponding algorithm: HillTophilltop: a search engine based on Expert DocumentsGoogle Secret - PageRank thoroughly explained http://www.kusastro.kyoto-u.ac.jp/~baba/wais/pageRank.html This article is When I check "Google PageRank", this article not only has an algorithm description, but also a Google's WebLog, which has recorded a lot of news about Google and some market dynamic information. Google's Secret - PageRank thoroughly explained Chinese version
More detailed PageRank algorithm description: http://pr.efactory.de/
Web Log Statistics Tools AWSTATS: Added Unicode's decoding and definition of China's main portal search http://www.chedong.com/tech/awstats.html
Robots's instructions: http://bar.baidu.com/robots/http://www.google.com/bot.html Search Engine automatically accesss web pages on the Internet through a program robot (also known as spider), Web information. You can create a plain text file robots.txt in your site, declare which content in the site can be accessed by Robot, which cannot be.
Anti-Google Site: Viewpoint is also very interesting http://www.google-watch.org/
About Google's WebLOGHTTP: //google.blogspace.com/
HillTop algorithm for Google
Search Engine Related Forum http://searchengineforums.com/http://searchenginewatch.com.com.com/Hetttp://www.webmasterworld.com/