Larbin reptile tool introduction

xiaoxiao2021-03-06  41

[1] Larbin's profile LARBIN is an open source network reptile / web spider, developed by French young people Sébastien Ailleret independently. The LARBIN object is to track the capture of the URL of the page for extension, and finally provide a wide range of data sources for the search engine.

Larbin is just a reptile, that is, Larbin only grabs the web page, as for the thing of Parse, is done by the user. In addition, things that store the database and establish an index are not available.

Latbin initial design is also based on the principle of design and simple but highly configurable, so we can see that a simple LARBIN reptile can get 5 million web pages a day, it is very efficient.

[2] Larbin's performance characteristics: efficient, basically an hour to climb 3 G web pages. Almost 200,000 pages; URL analysis: 2 million-3 million / hour

[3] Larbin's role briefly introduces the function and practical application of LARBIN. 1. LARBIN gets a single, determines all the coupling of the website, and even mirror a website. 2. LARBIN establishes a URL list group, for example, after URL Retrive for all web pages, the acquisition of XML connection is performed. Or MP3. 3. LARBIN is customized as a source of information from the search engine (for example, the captured web page can be placed in a series of directory structures per 2000).

Attribution, LARBIN should be a product that is noticed by the majority of search engine enthusiasts, although its function is gradually accepted and replaced by Nutch, but its beautiful design on the reptile is worthy of praise.

转载请注明原文地址:https://www.9cbs.com/read-56448.html

New Post(0)