Computer text classification and blurring clustering :: [Search Engine]

xiaoxiao2021-03-06  18

Beta version of Robot News: Machine News Center, because there is no more time development, temporarily collect your opinions and suggestions, and continue to develop.

1. What is classification? The classification is to automatically identify one article / text, and the match is performed according to the prior category. What is cluster? Clustering is a comparison of a group of articles / text / information to classify the article / text / information of comparative acquaintance to the same group. What is a blurry cluster? Fuzzy clustering is that there is no priori clustering factor, fully identifiable and class size, how much, the class is erroneously, according to the algorithm.

Examples here, for example:

[ENTERNEWS] :: Advance: Look at the color of the man _Tom life

One always emphasizes that it is a famous article after the famous door, called "men's division", thinking that men's "upper half of the cultivation, the next half is essential". And, if "the lower half is no play, it will definitely be fine." Usually everyone thinks that men can complete the sex and love, in fact, it does not necessarily, Watanabe is said: "The sexual behavior of the man is actually very spiritual." Although men were unfortunate by this woman, they were brought by this woman. But I want to be in the eyes of today's women, is the man should be divided? Where is it different? These are not very important. ...

News from: life.news.tom.com news.sdinfo.net news.sohu.com www.chinanews.com.cn related content has 10

Where: Enternews is to say this information is a news of entertainment, here is classified. Then the summary of the news, then the source of the news, I only give 4 news sources, such as life.news.tom.com. There are 10 items: this is clustering, that is, from the entire entertainment news, there are 10 topics and contents that have 10 news, so they cluster as one.

Let's take a look at these 10 news: advocate: see the color of the man _Tom life; "March 8": Woman does not love men unhappy women's channel Southern network; divorce, first dignity or money? Women's Channel Southern Network .... You can basically see a female topic related to love.

.

From this point of view, it is necessary to have some value. Of course, there is also an incorrect place: clustering tanks to the Internet, I will tell this question.

2. Why is classified and clustering? The key is on a blurry. Because the machine is unlike people with strong cognitive ability, the practice used by newscatology and clustering is usually what we said, or more accurately, it is actually a fuzzy feature identification.

The computer does not see the feature, then where is these features come from?

sample

What is the sample? The sample is the prior information used to perform information identification. It is simple to tell the computer first, what is entertainment information, what is the Internet, what is sports and so on. The computer has obtained the intuitive feature of these samples, such as numbers, such as sorting, such as a position of the preposition, such as a position, or the like of a word, or the like.

The quality of the quality of the sample and the size of the sample are directly affected to future recognition and errors. For example, the sample is deviated, then it is impossible to imagine the result of identification, and the size of the sample is also very important, I will give an example below:

We found 50 apples and 50 plums to show us the machine, tell us that 50 things are Apple, that 50 is plum. The situation is divided into this: If these 50 Apples are red, and 50 plums have red, then when we come up with the green apple that has not seen it at all, the computer will be very likely to put this green. Apple judges into a plum. Therefore, the sample should be unfavorable.

For example, I only gave the computer to watch 5 apples, there is a small, red and green, then take a new apple to see the computer, the computer may not be able to judge, because the sample is too small.

Sample -> Learning -> Inspection -> Correction -> Learning -> Inspection .... This is a process that is classified and identified, until it is able to fully understand human knowledge, I am afraid it has improved Room

转载请注明原文地址:https://www.9cbs.com/read-41054.html

New Post(0)