Papers - Resource Collection System Based on Sweepline and Yuan Search Engine Technology

xiaoxiao2021-03-06 101

First explain a few concepts:

1. "Sweeping" is a Word Extraction, or it can be translated into "extraction". The anti-whisker said is "pumping" the key content of a piece of text. The premise of "smash" is "word".

2. "Yuan Search Engine" is English is Meta Search-ENGINE. This search engine does not have its own BOT, nor does it have to be too large database. Its role is to send the user's search request to each search engine, and return to the end user after some collection of information returned from each engine (or synthesis).

3. "Resource Search" doesn't have to say it. That is to find some information about a related topic. Here to consider "quotation rate" and "check rate".

My rough idea is: given the difficulty and work strength of hand-collected collection. Can we make these works to the computer? The steps are as follows: 1. Provide a description of a resource by the user. For example, if you want to find data on data structure, you can provide an overview of the data structure to this system. In general, the directory can be. 2, "Xing" 3, the word "pumping" is sent to multiple search engines in the work mode of the Yuan Search Engine, bringing the result to the result (this step is the key, involving a collection plan problem, That is, the sorting strategy). 4. After searching, the system notifies the user in some way (log, email, net send message, etc.). The information collected at the same time is stored in the database and file system.

Note: The species of resources here generally include HTML pages, various OFFICE documents, Flash files, and even MP3 files (significance may not be large).

转载请注明原文地址:https://www.9cbs.com/read-95471.html

9cbs

New Post(0)